IanevskiAleksandr / sc-type

GNU General Public License v3.0
239 stars 46 forks source link

Different cell type scoring dependent on subsamples used #67

Open ahoffrichter opened 3 months ago

ahoffrichter commented 3 months ago

Dear Aleksandr,

thank you for providing this tool. I used it to annotate some samples that I use with different subsets of samples integrated. I.e. Integration 1: Sample 1 & Sample 2 Integrateion 2: Sample 1-6 etc.

I realised that depending on the different compositions of samples, the same sample will have completely different cell types assigned to it. For example in integration 1 sample 2 would be classified mainly as Microglia, whereas in integration 2 sample 2 is mainly glutamatergic neurons. This is exactly the same sample just in combination with different samples. (see attached image). Could you explain how this is happening? Also how can I decide, which classification to trust more.

Another thing that I realised is, that the number of markers provided in the database also seems to have an influence on the cell type assignment. I added some entries to the DB you provided. When I use ~100 markers for cell type x most of the cells will be assigned to either this cell type or unknown. When I only use a subset of ~10 markers for cell type x, there are more cells that also get another cell type assigned. Would you advise not to use more than let's say 15 markers for a cell type?

Example