Closed wasade closed 1 year ago
Test failure is due to doc build with Sphinx on py3.7, attempting to work through it.
...okay, now waiting on resources to test this.
@qiyunzhu, Antonio is out-of-office for a while. Is there any chance you could do a review of this PR? The change is relatively straightforward: instead of aggregating data in a dense numpy matrix, we aggregate into a "dict of keys" sparse matrix followed by conversion to compressed sparse column.
The current method supports divide, and no problem for BIOM. Good call on float64
, just pushed a change for that.
The divide may trigger an underflow rather than an overflow. In which case, numpy would warn:
In [2]: v = np.float64(1e1000)
In [3]: v /= 1e1000
<ipython-input-3-4eac3ab25e8a>:1: RuntimeWarning: invalid value encountered in double_scalars
v /= 1e1000
In [4]: v
Out[4]: nan
Thanks! My test is still running so will hold off merge for the time being
Avoid dense representation on one_to_many. This code path was implicated in
woltka collapse
.Note this is a WIP, I am asserting the memory reduction in
woltka collapse
right now. However, the high memory requirement is nearly certainly due to the dense memory representation previously used.cc @antgonza @qiyunzhu