bowang-lab / scGPT

https://scgpt.readthedocs.io/en/latest/
MIT License
1.01k stars 197 forks source link

Size of the whole human dataset #212

Closed NozomiMizore closed 4 months ago

NozomiMizore commented 4 months ago

Thanks for ur great work! The paper and docs have mentioned that the whole-human dataset consists of 33M cell 's data. And u have given some shell scripts to crawl the data from CellxGene. Actually, I want to know the size of whole-human dataset. How many TBs?

subercui commented 4 months ago

Hi, I think the raw data in anndata h5ad format is around 700GB. If you crawl the data recently, the size can get larger since the update on CELLxGENE.