Open eigenhombre opened 9 years ago
Followup -- I started working on supporting additional distance functions. API question -- the current API exports a :dist-squared
keyword on the points returned by nearest-neighbor
, which obviously implies Euclidean distance (squared). In my current approach I am just renaming this to :dist-result
, but this will break the API for anyone who relies on the old keyword. Alternatively, we could return :dist-squared
when the default, Euclidiean distance is used, and :dist-result
otherwise, though that seems a bit hacky to me. I checked the couple of projects that use clj-kdtree
on crossclj.info and didn't find anyone using this keyword. Thoughts?
:dist-squared
used both internally and exported as API. It would make sense to have just :dist
without squared, but having squared optimizes things a little bit as you don't need to calculate sqrt at all. It's probably makes sense to have additional step before you give result back to user which replaces all :dist-squared
with :dist
by calculating sqrt. This way internally we can still use dist-squared
as optimization and also have easier to understand API.
BTW, will yourchanges to introduce additional metrics support this 2 notions of distances for same metric (if needed), like normal dist for public API and "squared" internally?
@eigenhombre Yes, I'd absolutely welcome a PR!
I like the notion of having the API expose :dist
for when you need to actually know that value -- i.e. maybe even compute it lazily -- but allow for internal comparison via a potentially faster implementation.
I'm not sure how widespread use of this library is, but it might be friendlier to deprecate :dist-squared
for one release and provide both fields. Thoughts on that? I've no strong opinion either way.
I like the more general :dist
value, possibly computed lazily when needed from :dist-squared
(or other internal metric value in the tree). However, I have a bigger worry. For spherical geometry the algorithm needs to "wrap" continuously at 0 and 2π (or -180 & 180 degrees longitude) and I haven't had the time to figure out how to accommodate that (see my question on cs.stackexchange, which is what led me to this library in the first place). This, along with the Haversine metric for calculating great circle distance, is what's needed to do nearest neighbor correctly on the sphere. I wonder if the API should accommodate both geometric (metric) and topological (connectedness) concerns. For example, one straightforward application could be a flat (Euclidean) metric in an n-dimensional torus (think the old school video game Asteroids as a use case). Any thoughts on whether it would be worth supporting connected edges in the API (aside from my use case, which I definitely need it for)?
Currently the (square of the) Euclidean distance is the only metric allowed, with the simplest (i.e., non-wrapping) topology. I'd be interested in getting this to work on e.g. a sphere with the Haversine (great circle) distance. Use cases for this include distances on the Earth (GIS). @abscondment would you welcome a PR for that?