Open Jaimomar99 opened 1 year ago
You need to aim for sufficient representation of your populations of interest. In some rare cases, very few cells can suffice to define informative reference gene expression signatures (eg 10 cells). However, I would generally recommend 100s of cells per population of interest (at least 40-50 rarer cells). For an atlas of one tissue, you can get a decent reference with 40k cells (eg. mouse brain data in our paper).
Hey, thanks a lot for this wonderful package. I've been experimenting with it, and the results have been very accurate according to the pathologists.
I have a question about the size of the single-cell dataset. I guess that the bigger the dataset, the more data and the better would be the results. However, I'm unsure about the optimal number of cells I should aim for when generating a single-cell dataset. I'm trying to find the right balance between performance and cost-effectiveness. Of course there is not correct answer, but if you could provide some insights I will appreciate it
For instance, would a dataset with 4k cells be sufficient, or should I aim for a larger number? Are there any research papers or methods that explore this correlation? Jaime