Open atolopko-czi opened 3 years ago
If the above solution seems reasonable, I can go ahead and make the PR. Let me know!
If the above solution seems reasonable, I can go ahead and make the PR. Let me know!
Please do! I'm not sure when the next cellxgene release will be cut, but this may be reason enough to do so.
From Slack: Alexander Tarashansky: In backend/common/compute/diffexp_generic.py, (lines 122-133):
I added a comment next to the problem line. For extremely huge, sparse matrices (like tabula sapiens), this is going to create an intermediate, totally dense copy of the sparse data. I'd recommend using this utility function from sklearn instead: http://scikit-learn.org/stable/modules/generated/sklearn.utils.sparsefuncs.mean_variance_axis.html
for ex, finding marker genes for a cluster of 20 cells in tabula sapiens took 8 minutes before the above change and 40s after