djsutherland / py-sdm

Python implementation of nonparametric nearest-neighbor-based estimators for divergences between distributions.
http://cs.cmu.edu/~dsutherl/sdm/
BSD 3-Clause "New" or "Revised" License
48 stars 8 forks source link

sparse format for divergences #31

Open djsutherland opened 11 years ago

djsutherland commented 11 years ago

Especially when doing Nyström and/or in Hadoop Streaming, especially with both, having the full divergence matrix in memory (and mostly nan) is awful. Should rewrite the core sections of estimate_divs to use a sparse format (probably a manual COO format, like x/y/value, rather than actually using scipy.sparse in the cython bit).

djsutherland commented 10 years ago

Probably want to special-case this to handle the dense case well, too, without too much code repititon.