ArnaoutLab / diversity

Partitioned frequency- and similarity-sensitive diversity in Python
MIT License
6 stars 1 forks source link

SimilarityFromFunction regenerates similarity matrix multiple times #71

Closed chhotii-alex closed 10 months ago

chhotii-alex commented 10 months ago

The similarity matrix gets generated 3 times, once each from calling metacommunity_similarity, subcommunity_similarity, and normalized_similarity. As this is the order n-squared operation, we should avoid doing this unnecessarily. The use case is that the similarity matrix may be too large for RAM, so we don't want to cache the generated matrix; however, we could do all matrix multiplications in parallel, thus requiring the matrix only once. This should result in a nearly 3 times speed-up on large data sets.