abscondment / clj-kdtree

kd-trees in Clojure
MIT License
57 stars 11 forks source link

Support other distance metrics #11

Open eigenhombre opened 9 years ago

eigenhombre commented 9 years ago

Currently the (square of the) Euclidean distance is the only metric allowed, with the simplest (i.e., non-wrapping) topology. I'd be interested in getting this to work on e.g. a sphere with the Haversine (great circle) distance. Use cases for this include distances on the Earth (GIS). @abscondment would you welcome a PR for that?

eigenhombre commented 9 years ago

Followup -- I started working on supporting additional distance functions. API question -- the current API exports a :dist-squared keyword on the points returned by nearest-neighbor, which obviously implies Euclidean distance (squared). In my current approach I am just renaming this to :dist-result, but this will break the API for anyone who relies on the old keyword. Alternatively, we could return :dist-squared when the default, Euclidiean distance is used, and :dist-result otherwise, though that seems a bit hacky to me. I checked the couple of projects that use clj-kdtree on crossclj.info and didn't find anyone using this keyword. Thoughts?

nbeloglazov commented 9 years ago

:dist-squared used both internally and exported as API. It would make sense to have just :dist without squared, but having squared optimizes things a little bit as you don't need to calculate sqrt at all. It's probably makes sense to have additional step before you give result back to user which replaces all :dist-squared with :dist by calculating sqrt. This way internally we can still use dist-squared as optimization and also have easier to understand API.

BTW, will yourchanges to introduce additional metrics support this 2 notions of distances for same metric (if needed), like normal dist for public API and "squared" internally?

abscondment commented 9 years ago

@eigenhombre Yes, I'd absolutely welcome a PR!

I like the notion of having the API expose :dist for when you need to actually know that value -- i.e. maybe even compute it lazily -- but allow for internal comparison via a potentially faster implementation.

I'm not sure how widespread use of this library is, but it might be friendlier to deprecate :dist-squared for one release and provide both fields. Thoughts on that? I've no strong opinion either way.

eigenhombre commented 9 years ago

I like the more general :dist value, possibly computed lazily when needed from :dist-squared (or other internal metric value in the tree). However, I have a bigger worry. For spherical geometry the algorithm needs to "wrap" continuously at 0 and 2π (or -180 & 180 degrees longitude) and I haven't had the time to figure out how to accommodate that (see my question on cs.stackexchange, which is what led me to this library in the first place). This, along with the Haversine metric for calculating great circle distance, is what's needed to do nearest neighbor correctly on the sphere. I wonder if the API should accommodate both geometric (metric) and topological (connectedness) concerns. For example, one straightforward application could be a flat (Euclidean) metric in an n-dimensional torus (think the old school video game Asteroids as a use case). Any thoughts on whether it would be worth supporting connected edges in the API (aside from my use case, which I definitely need it for)?