Especially when doing Nyström and/or in Hadoop Streaming, especially with both, having the full divergence matrix in memory (and mostly nan) is awful. Should rewrite the core sections of estimate_divs to use a sparse format (probably a manual COO format, like x/y/value, rather than actually using scipy.sparse in the cython bit).
Especially when doing Nyström and/or in Hadoop Streaming, especially with both, having the full divergence matrix in memory (and mostly nan) is awful. Should rewrite the core sections of
estimate_divs
to use a sparse format (probably a manual COO format, likex
/y
/value
, rather than actually usingscipy.sparse
in the cython bit).