Closed lkqnaruto closed 3 years ago
In general, the proportional composition of the reference and the composition of the test do not have to be the same, or even particularly similar, as long as the cell types in the reference are a superset of those in the test. If my reference contained PBMCs and my test contained some purified subtype, then a difference in composition is not surprising.
That said, if you're expecting the reference and test to be similar, then there is some cause for concern. I would run through some of the diagnostics in the book and check that the predicted AT1 cells are expressing some sensible markers for AT1, based on the de.genes
and some biological knowledge of what defines an AT1 cell. That should indicate whether the labels can be trusted.
Hi
My the reference dataset contains very few AT1 cells and AT2 cells (around 1% of total number of cells), however, in the output of SIngleR, cells which are annotated as AT1 cells and AT2 cells dominates in the test set, almost 30%, I wonder whether this is statistically and computationally reasonable?
I know SingleR choose the "best" label for a cell by iteratively filtering out the bad candidate labels until two labels left. From this point of view, it looks like the case where SingleR identifies more AT1 cells and AT2 cells than my expectation (in terms of the proportion) is reasonable. But I still would like to have some input from you guys, it would be very helpful to have your feedback on this case.
Thanks!