ArnaoutLab / diversity

Partitioned frequency- and similarity-sensitive diversity in Python
MIT License
6 stars 1 forks source link

sparse similarity matrix? #54

Closed chhotii-alex closed 10 months ago

chhotii-alex commented 1 year ago

It looks to me (from reading the code) that it will not accept a sparse matrix (either an spmatrix or an sparray (which is admittedly a new thing)) for the similarity matrix. This is unfortunate, because whether we create the similarity matrix upfront or have it created on the fly by a Callable or read from a file, it's big-O n-squared to generate the whole thing. Many problems are only going to be computationally tractable if we figure out what pairs have more than negligible similarity and create values for only those parts of the array.

chhotii-alex commented 10 months ago

The same class that handles similarity as np.array seems also well-behaved for scipy's sparse matrices/arrays. Just had to update the constructor function to choose that class. Need to test that this works on other versions of scipy, since their API is evolving (tested for scipy 1.10.0, should also test for 1.11.1, 1.11.4 and 1.12.0). Also test that other sparse classes (such as dok_array) are well-behaved; the remaining ones have different API to create.