ArnaoutLab / diversity

Partitioned frequency- and similarity-sensitive diversity in Python
MIT License
6 stars 1 forks source link

Sparse abundance counts refactor #42

Closed Elliot-D-Hill closed 1 year ago

Elliot-D-Hill commented 2 years ago

Implementing Abundance counts as a sparse matrix would save memory, computation, remove the need to reindex the similarity matrix, and simplify species_ordering handling.

Sketch of the implementation:

  1. Take species_ordering from similarity matrix.
  2. Pivot counts, but before constructing the abundance matrix, sort the species indices in the same order as species_ordering.
  3. Construct sparse abundance matrix from sorted pivot rows.
  4. Clean up old code: remove unused species_subset arguments (but keep in make_metacommunity)
Elliot-D-Hill commented 1 year ago

After some experimentation, it looks like we will likely not see significant speed improvement for most cases, so I am closing this issue for now.