dviraran / SingleR

SingleR: Single-cell RNA-seq cell types Recognition (legacy version)
GNU General Public License v3.0
271 stars 98 forks source link

quantile.use argument #51

Closed roosheelpatel closed 3 years ago

roosheelpatel commented 5 years ago

Hi, Thanks for developing such a great package! I was hoping if you could elaborate on the decision making process for the 'quantile.use' argument. I have been playing around with different values and am getting widely different results for each run. I looked at the documentation,

'correlation coefficients are aggregated for multiple cell types in the reference data set. This parameter allows to choose how to sort the cell types scores, by median (0.5) or any other number between 0 and 1. The default is 0.9.'

What does the quantile of 0.9 mean and is their any intuition in how to decide this variable for a given dataset?

Thank you so much and look forward to hearing from you!

dviraran commented 5 years ago

Thanks.

You can see in supp info 1 how SingleR makes its decisions. Each cell type in the reference may have multiple samples, and SingleR chooses the top cell types. For each single-cell there you can imagine such boxplots. The question is how to order them. One option is based on the median, but the problem is that the samples associated with the cell type may be a mix of multiple subsets, and taking the median might be problematic. Another approach is just the max (1), but this can lead to false results because of randomness. I played with 0.75, 0.8, 0.9, and they all gave me similar results more or less. The intuition I use - if there are many samples for each cell type use a high value since there might be multiple subtypes combined together. This is why I use 0.9 for the 'main cell types' option.

Hope this makes sense.

Best, Dvir