Loading full dataset h5ad file crashes Kernel

LungCellAtlas / HLCA

MIT License

48 stars 5 forks source link

Loading full dataset h5ad file crashes Kernel #5

Closed Tonyspuri closed 1 year ago

Tonyspuri commented 1 year ago

Hi,

This might be an anndata/scanpy related question but I'm trying to load full dataset on a computer with 64GB of RAM, even though h5ad file is only ~20GB, reading the file quickly fills the memory and crashes the kernel. Unfortunately increasing memory won't be an option.

anndata 0.9.1 scanpy 1.9.3

Thanks!

Update: Loading anndata object in 'backed' mode helps as read only adata = sc.read_h5ad('./local.h5ad', backed='r')

LisaSikkema commented 1 year ago

Hey @Tonyspuri , that sounds right, the atlas was stored as a compressed file, and de-compressed it's about 100Gb in size. You'll indeed have to use backed mode, and then you could e.g. subset and load the subset into memory. You could for example subset genes, there are now about 60k genes in there and a lot of those you likely won't need and might be specific to single datasets only

LisaSikkema commented 1 year ago

I will close this issue now, but feel free to re-open if you have further questions related to this issue!