SydneyBioX / scDC

https://github.com/SydneyBioX/scDC
10 stars 4 forks source link

Working with large cell numbers #1

Closed lucygarner closed 3 years ago

lucygarner commented 4 years ago

Hi,

I was wondering whether you have any approaches for using scDC with large cell numbers? I am working with a dataset of ~500,000 cells and unfortunately it seems that the scDC approach will be unfeasible in this case. I have been running scDC_noClustering for 12 hours with 12 cores and only 50 bootstraps and it is still at the Calculating bootstrap proportion... phase.

Thank you.

Best, Lucy

ycao6928 commented 4 years ago

Hi Lucy,

Thanks for trying out scDC! Would it work for you to randomly sample a smaller subset of the cells, eg. 5000, treat it as a representative population and run scDC on that?

lucygarner commented 4 years ago

Thank you for the suggestion. I worry that the results would not be robust if we had to downsample that much. From your testing, what is the max number of cells that scDC is scaleable for?

ycao6928 commented 4 years ago

The largest dataset size we have tested so far is 4000 cells with 10 cell types. I don't remember exactly the run time, but it was reasonable.

lucygarner commented 4 years ago

Ok thank you.