alimanfoo / anjl

A neighbour-joining library for Python.
https://alimanfoo.github.io/anjl
MIT License
0 stars 0 forks source link

Rapid neighbour-joining #8

Closed alimanfoo closed 2 weeks ago

alimanfoo commented 2 weeks ago

This PR has an alternative implementation based on the rapid neighbour-joining algorithm of Simonsen et al..

(Surprisingly, despite the fact that the algorithm does significantly reduce the search space, the implementation is still slower than a well-tuned implementation of the canonical algorithm. The notebooks here have some detailed diagnostics.)

(I suspect the issue is due to memory locality. Because the canonical algorithm searches the distance matrix in memory order, CPU caching is efficient. The rapid algorithm takes pairs in sorted order, and unless the distance matrix is also sorted, this could mean relatively scattered memory access.)

(An alternative approach could be to sort the distance matrix so the scanning can also be in memory order. However I found implementing that difficult and backed off. I leave this implementation here for general interest and brave souls who feel like a challenge!)

Following on from some initial work, a sorted copy of the distance matrix is maintained and used to accelerate data access. We are now getting a ~3x speedup on a large distance matrix (~8000x8000) over the canonical implementation.

alimanfoo commented 2 weeks ago

Here are a couple of results from benchmarking a 2000x2000 distance matrix. The comparison is between the canonical algorithm, and the rapid algorithm with different frequencies of "garbage collection" which compact the data structures.

This plot shows the number of pairs "visited" during the search.

image

This plot shows the time per iteration of the algorithm.

image

This is the time per visit, i.e., the time spent within each cycle of the inner search loop.

image

Basically, the rapid algorithm does cut down the search space a lot, but spends about ~7x longer in each inner loop, without any obvious reason why in terms of additional computation. Hence suspecting memory locality.

alimanfoo commented 2 weeks ago

Some new results after changing to use and maintain a sorted copy of the distance matrix. Also change to avoid duplicate comparison of leaves.

image

image

image

alimanfoo commented 2 weeks ago

image