all-nearest neighbours algorithm

schlegelp commented 7 months ago

I'm tempted to implement this to see if it might be useful for NBLAST: ALL NEAREST NEIGHBOUR CALCULATION BASED ON DELAUNAY GRAPHS (arXiv)

Have you by any chance already looked into/tried something like this @clbarnes @aschampion @jefferis?

schlegelp commented 7 months ago

Cobbled together a working prototype: https://github.com/schlegelp/aann

Quick & dirty benchmarks finding all nearest-neighbours between two neurons (120k and 23k nodes, respectively):

pure Python + numpy implementation (see notebook in repository): 17s
Rust implementation: 1.4s
scipy's cKDTree: 2.1s
pykdtree: 0.25s

I think that looks promising. The rust implementation gave me some headaches and there is definitely room for improvement if you have time to take a look.

schlegelp commented 7 months ago

Quick update:

I managed to get the Rust implementation from 1.5s down to ~0.5s in my test scenario. Closer to pykdtree (0.25s) but still not good enough. I should also say that timing really depends on the neurons used for the comparison and I need to run a larger scale benchmark for proper testing.

At this point I'm a bit stuck - with my limited experience in Rust I don't really see anything else that could be optimized. I also wrote another implementation in Cython (where I have slightly more of an inkling) but couldn't push it below ~0.5s either.

schlegelp commented 7 months ago

All-by-all query with 100 random neurons from the hemibrain dataset:

	build[s]	query[s]	total[s]
scipy cKDtree	0.235956	276.087	276.323
pykdtree	0.113741	37.4036	37.5174
delaunay	16.9499	46.1051	63.055
delaunay_skel	15.6494	19.664	35.3134

delaunay_skel is using the skeleton topology instead of a full blown Delaunay to build a graph of the local neighbourhood. This leads to smaller neighbourhoods which means faster query times but more inaccurate nearest neighbour (needs testing). The build process for this could be optimised.

clbarnes commented 7 months ago

I had a look around and it seems there aren't any published 3D delaunay triangulation implementations in rust :cry: The tree topology wouldn't be available for LM->EM conversions and as you say, would be inaccurate in the case that you needed to hop between nearby branches.

jefferis commented 7 months ago

I was always interested in trying to find/make an all by all implementation. Nevertheless, my reading of the paper you linked was that the advantage in their Java implementation really only appeared in the scale of ~ 100,000 points. I think in general the construction and querying of k-d indices is very efficient in the best nearest neighbour packages – they have been very professionally optimised. Therefore we as amateurs in this area can probably only hope to be competitive when the big O scaling really starts to tip in favour of the all by all approach ie with a lot points. But since we should normally be operating in the 200-5000 point range, i think we are unlikely to get a major speed-up.

All-by-all query with 100 random neurons from the hemibrain dataset:

do you know the range/distribution of points-per-neuron in this sample?

clbarnes commented 7 months ago

the advantage in their Java implementation really only appeared in the scale of ~ 100,000 points

Yes, doesn't figure 1 suggest that tree-based implementations are CPU-faster when working in the 0-10k point range, which is mainly where we're working?

schlegelp commented 7 months ago

I had a look around and it seems there aren't any published 3D delaunay triangulation implementations in rust

I thought I had seen at least one. What about simple_delaunay_lib or d3_geo_voronoi_rs?

would be inaccurate in the case that you needed to hop between nearby branches

Exactly. Could ameliorate the problem by introducing extra edges but that would increase the query time again. Not sure how often that makes a massive difference though - will test.

Therefore we as amateurs in this area can probably only hope to be competitive when the big O scaling really starts to tip in favour

Definitely true for me but I have faith in @clbarnes and @aschampion ;)

I was for example thinking about using SIMD (single instruction multiple data) for the distance calculation - which is likely one of the bottlenecks - but that turned out to be too advanced for me.

FWIW: I came across a couple other papers on this topic. One described an implementation using RTrees but optimising the search using local neighbourhood to speed up all-by-all queries.

do you know the range/distribution of points-per-neuron in this sample?

I'm running a larger benchmark at the moment. Will share distribution once that's finished.

schlegelp commented 7 months ago

This is the distribution of node counts for the sample of 100 neurons:

sample_node_hist

Average size is ~5.5k nodes.

advantage in their Java implementation really only appeared in the scale of ~ 100,000 points

I just quickly tried combining all 100 neurons into one large point cloud to use as the query - i.e. instead of 100 queries x 100 targets I run 1 combined query x 100 targets and split the results into the original neurons afterwards. Unfortunately that doesn't seem to make much of a difference. Looks like the size of the target is the determining factor.

schlegelp commented 6 months ago

Another small update:

I made various small improvements such as avoiding making unnecessary copies/initialisations and using SIMD for distance calculations.

Here is a new benchmark - this time with 200 hemibrain neurons which favours shorter query times:

	build[s]	query[s]	total[s]
pykdtree	0.314892	232.683	232.998
delaunay	41.8432	117.91	159.753
delaunay_skel	51.7473	46.3242	98.0715

As you can see the graph-based approximate nearest neighbor implementation is now in total ~2x faster than pykdtree. I might be able to bring the query time down a tad bit more but I am also looking into better/different ways to build a neighborhood graph.

clbarnes / nblast-rs

all-nearest neighbours algorithm #32