arjunmehta / node-geo-proximity

Super fast proximity searches of geo coordinates.
MIT License
132 stars 18 forks source link

Limiting items #16

Closed etopian closed 9 years ago

etopian commented 9 years ago

Is it possible to add something to limit the number of items returned, and perhaps save on computation because of that? I potentially have a few million items being returned, and I would prefer it to return only a few hundred. Thanks.

arjunmehta commented 9 years ago

Hi there @etopian.

Would you be able to expand on how you're using the module? Would you expect pagination? It's possible to LIMIT the query results to any number but this would be difficult to do right. The few hundred items returned would be completely arbitrary.

Either way this would be low priority on the list of things to implement. (along with other projects I'm working on :))

Can you consider using multiple sets? Or reducing your radius size? This will let you reduce the number of returned results.

etopian commented 9 years ago

Pagination would be nice, but limiting items is necessary because in this application people add items and those are then queried. There is no way of knowing in a given region how many items there are. I am worried of a ton of items near each other exhausting the memory of the NodeJS process. But I guess I will just wait for now and see if this becomes a problem and then fix it. Or perhaps switch to MongoDB depending on how performant it is.

arjunmehta commented 9 years ago

Are people adding items to a single collective set? Is there any way to break the set up into subsets? If you can typify your "items" you can have subsets for each item type. This will allow you to control your queries.

Perhaps you're designing for a scale that you're not at yet? It should be really easy to switch to MongoDB in the future. Or it may be the right choice now if it offers you the ability to paginate replies.

etopian commented 9 years ago

Are people adding? Yes.

No there is no way to break it into subsets.

I am designing things to scales that don't yet exist.

Apparently MongoDB had similar problems, here is how they were solved for reference. http://emptysqua.re/blog/paging-geo-mongodb/

arjunmehta commented 9 years ago

@etopian Thanks for the reference! Unfortunately this module doesn't work the same way mongodb does in that there is no ordering by distance.

I'm going to close this issue for now. Please feel free to try contributing to the module if you think others might benefit from the ability to page through results!

etopian commented 9 years ago

Can you provide an explanation or a reference about how it works? Perhaps add it to the project page. I might take a crack at it, if you catch me up.

arjunmehta commented 9 years ago

The readme has a link to a gis stackexchange answer that explains it: http://gis.stackexchange.com/questions/18330/would-it-be-possible-to-use-geohash-for-proximity-searches/92331#92331

Also further reading here: http://23.239.12.206:8000/posts/2014-04-02-geohash-proximity-pt1.html

Thanks!

arjunmehta commented 9 years ago

I'll actually reopen this issue, and consider getting to it if the time comes up :) But please feel free to explore. If you do, just fork the module and see if you can get it to work for your needs. And of course share your results!

etopian commented 9 years ago

Apparently https://github.com/yinqiwen/ardb/blob/master/doc/spatial-index.md is using geohashing and has already implemented limits.

arjunmehta commented 9 years ago

@etopian Have you had a chance to look at the source for this module and compare it to ardb's geo module?

I gave it a quick scan and it looks like the "limiting" is not limiting the redis query results, so you'll still get all the results loaded in memory, at least initially. The limit in ardb is forced by discarding the points returned over the limit. This saves no processing (it actually increases CPU use), nor query bandwidth, but will potentially save memory for an initial search. But really you could implement something of your own by just slicing and discarding the results over a certain limit anyway.

How I would want to implement it would be by limiting the query. This would require some modulus math to split the limit_value up and distribute it across the the queries (node-geo-proximity does 4-5 redis queries for each proximity search). It would also require a next() method or a way to continue the query from a cursor, and of course be able to take any arbitrary start/end values and consistently be able to start and end so all results are eventually returned consistently.

Unit tests would need to be made, and some edge case scenarios would need to be considered (ie. what happens when I new point is added to a query that is being paginated? Some points might get missed, but maybe that's okay).

So, obviously it's not as simple as adding or copying the feature from another library. Every implementation of 2D spatial index is done differently and requires its own tricks/algorithms to implement features.

I like the idea of being able to paginate/limit results, but I don't have the time to implement right now, unfortunately.

I do plan on doing a v3.0.x version of this module so, I will consider adding this feature for that.

etopian commented 9 years ago

I have not had a look at ardb's geo stuff. I looked, but didn't grok it. In any case it's interesting because it's not just a memory store like Redis. It persists to disk. Using the method you used is it possible to do a query further than a certain distance? If so then that would make it relatively easy to page, much like the MongoDB example above. You would just look up small increments of distances away, say like 1km to 2km... and add the results to a Redis ordered list. I personally am thinking of using Ardb for my project because of the fact that it's not just a memory store like Redis and it's just as fast.

arjunmehta commented 9 years ago

@etopian I'm releasing a new version of this module as an entirely new module, georedis. Have a look at this new API.

Redis has added native geo commands (currently in beta) and this module will take advantage of them. It also falls back to provide all functions redis geo commands natively provide, but It is very slow when running in this emulation mode. Still, may be interesting to you. Would be nice if you'd like to beta test: npm install georedis

Have a look, and tell me what you think. You may want to consider testing out the new native geo commands.

arjunmehta commented 9 years ago

Will close this issue for now.

etopian commented 9 years ago

cool, thanks Arjun.