LungCellAtlas / HLCA

MIT License
48 stars 5 forks source link

HLCA Full Seurat? #11

Closed Dzhan4 closed 7 months ago

Dzhan4 commented 1 year ago

From the CZI website it states that the limitations of R dgCMatrix prevent a seurat object download for the full dataset.

Since there are newer methods to convert large h5ad dataset to seurat objects, is it worth attempting this conversion? Or is there some fixed reason why a seurat object cannot be used to include the full dataset?

Thank you!

LisaSikkema commented 1 year ago

Hi @Dzhan4 , I am not very knowledgeable on the subject as I haven't used the HLCA in R myself yet, but my understanding is that the R object is simply too large to work with in R. Would the new conversion method create a smaller R object? In that case it might be worth trying. People from cellxgene would probably have better advice here.. I know that you can also use census from cellxgene (https://chanzuckerberg.github.io/cellxgene-census/) to subset the object before downloading, which should enable you to download and work with a subset of the data in Seurat, if you don't need all data.

HendricksJudy commented 12 months ago

Analyzing such a vast dataset (About 22Gb) in R language is possible. However, due to the memory management feature of R language, it may need 64G RAM or even more (My environment has 128GB RAM) to ensure an acceptable experience. And it is so slow that one command operating in R may waste minutes. It is better to try the cellxgene or python 😃