KristofferC / NearestNeighbors.jl

High performance nearest neighbor data structures (KDTree and BallTree) and algorithms for Julia.
Other
418 stars 65 forks source link

Add tree walking functions #199

Open dgleich opened 3 months ago

dgleich commented 3 months ago

This is an initial take at how to setup the tree walking codes.

This addresses #194.

I need to add more documentation still, but this should be enough to get some initial feedback before writing more docs.

dgleich commented 3 months ago

I switched the names to ...

leafpoints, leaf_points_indices, treeindex ... I'm also thinking that using treeroot would be more consistent with these than root...

KristofferC commented 3 months ago

Random thought, would it make sense to implement the https://github.com/JuliaCollections/AbstractTrees.jl interface for the trees in this package?

dgleich commented 3 months ago

Good question. Let me think....

dgleich commented 3 months ago

On a related note, currently, we can implement a parent function for BallTree nodes because it stores the associated regions explicitly, but I don't see an obvious way to do this for KDTree nodes without storing the min and max values for the dimension that was split in the tree.

There are some key advantages to having a parent function, e.g. then you can do tree traversal and iteration without a stack. (And I think some of the abstract trees methods would need this...)

On the other hand, for pure NearestNeighbors functions, storing this extra information is unneeded.

How amenable would you be to adding a split_minmax value to the KDTree struct that stores that information?

This would just store a tuple of values for the boundaries of the dimension that is split so they can be restored via a parent call.

KristofferC commented 3 months ago

As a check of the functionality here it would be nice to reimplement https://github.com/KristofferC/NearestNeighbors.jl/blob/master/examples/balltree_illustration.ipynb using these official traverse functions. Doesn't strictly have to be done here but it would serve as somewhat of a use case check.