chanzuckerberg / cellxgene-census

CZ CELLxGENE Discover Census
https://chanzuckerberg.github.io/cellxgene-census/
MIT License
72 stars 19 forks source link

[python] ExperimentDataPipe: configurable `method` (`nd.array`, `scipy.coo`, `scipy.csr`) #1169

Open ryan-williams opened 1 month ago

ryan-williams commented 1 month ago

Introduce a new method ("np.array") for converting (COO) tiledbsoma.SparseNDArray data to (dense) torch.Tensor.

Comparison vs. existing ("scipy.csr") method:

image

Code/data here:

scipy.csr

Convert arrow.Table to scipy.sparse.csr_matrix (source). This is the current behavior.

np.array

Directly convert arrow.Table to np.array (source).

This method is new here, and seems to offer more speed at the cost of using more memory. It brings SOMA chunks into memory as dense np.arrays.