Open LASeeker opened 1 year ago
We are planning to make sctype 2.0 early next year (but it can appear on GitHub much earlier) with addition of many new cell types and analyses options. Thanks for the suggestions. Please send more markers with corresponding references if you have those.
Interestingly we had a similar situation in our lab. We noticed that the issue was originating by the fact that not all the marker genes were making the cut of the HVG.
To keep the object slimmer, we do not scale all the features in the object. And since the tool relies on the extraction of the scale.data
slot, if the genes are not there, the scoring is affected. In particular, we noticed that when exploring the scale.data
slot, not many genes were present from ScTypeDB_full.xlsx
.
For the positive markers.
lapply(gs_list$gs_positive,function(x){
sum(rownames(scobj[["RNA"]]@scale.data) %in% x)
})
$Astrocytes
[1] 8
$`Cholinergic neurons`
[1] 0
$`Dopaminergic neurons`
[1] 1
$`Endothelial cells`
[1] 10
$`GABAergic neurons`
[1] 3
$`Glutamatergic neurons`
[1] 2
$`Immature neurons`
[1] 0
$`Immune system cells`
[1] 0
$`Mature neurons`
[1] 2
$`Microglial cells`
[1] 7
$`Myelinating Schwann cells`
[1] 0
$`Neural Progenitor cells`
[1] 0
$`Neural stem cells`
[1] 0
$Neuroblasts
[1] 0
$`Neuroepithelial cells`
[1] 1
$`Non myelinating Schwann cells`
[1] 0
$`Oligodendrocyte precursor cells`
[1] 4
$Oligodendrocytes
[1] 0
$`Radial glial cells`
[1] 5
$`Schwann precursor cells`
[1] 0
$`Serotonergic neurons`
[1] 0
$Tanycytes
[1] 0
$`Cancer cells`
[1] 1
$`Cancer stem cells`
[1] 0
The quick and dirty solution we used, was to run ScaleData
again, specifying the features of interest.
# -------------------------------------------------------------------------
# run an ad hoc scaling to include the genes for the cell type annotation
scobj_test <- scobj %>%
# I can scale the missing features afterwards now focus on the highly variable one for speed purposes
ScaleData(vars.to.regress = c("percent.mt.harmony","nCount_RNA.harmony","S.Score","G2M.Score","origin","facility"), verbose = T,features = unique(unlist(gs_list))) %>%
identity()
dim(scobj_test@assays$RNA@scale.data)
es.max <- sctype_score(scRNAseqData = scobj_test[["RNA"]]@scale.data, scaled = TRUE,
gs = gs_list$gs_positive, gs2 = gs_list$gs_negative)
# -------------------------------------------------------------------------
Eventually, the pool of markers genes for Oligo was better represented. For positive markers
lapply(gs_list$gs_positive,function(x){
sum(rownames(scobj_test[["RNA"]]@scale.data) %in% x)
})
$Astrocytes
[1] 15
$`Cholinergic neurons`
[1] 2
$`Dopaminergic neurons`
[1] 8
$`Endothelial cells`
[1] 12
$`GABAergic neurons`
[1] 6
$`Glutamatergic neurons`
[1] 7
$`Immature neurons`
[1] 6
$`Immune system cells`
[1] 9
$`Mature neurons`
[1] 9
$`Microglial cells`
[1] 26
$`Myelinating Schwann cells`
[1] 4
$`Neural Progenitor cells`
[1] 14
$`Neural stem cells`
[1] 4
$Neuroblasts
[1] 6
$`Neuroepithelial cells`
[1] 7
$`Non myelinating Schwann cells`
[1] 4
$`Oligodendrocyte precursor cells`
[1] 6
$Oligodendrocytes
[1] 11
$`Radial glial cells`
[1] 11
$`Schwann precursor cells`
[1] 6
$`Serotonergic neurons`
[1] 4
$Tanycytes
[1] 1
$`Cancer cells`
[1] 3
$`Cancer stem cells`
[1] 6
Hi, Amazing to hear @IanevskiAleksandr that you are working on further improving sctype! I don't think in my case scaling the data would help because all genes were already represented in the scaled data slot. I also noticed that when I am running sctype on a randomly subsetted dataset (same number of nuclei per manually annotated cell type), it usually performs better and detects oligodendrocytes. So, I think it is not an oligodendrocyte problem per se but something else. Could it have to do with them being the most abundant celltype in the complete dataset? Interesting @pedriniedoardo that you saw have seen something similar. It would be great to hear from the community, if this happens with other cell types, too.
Hi Aleksandr, I just tested your sc-type method on my dataset (https://pubmed.ncbi.nlm.nih.gov/37217978/) and it works really nicely for most cell types. So thank you for that! I am showing below my first rough annotation (unknown turned out to be immune cells) followed by the annotation using cell-type.
You will see that sc-type performed really very well, however, it did not recognise oligodendrocytes (which happen to be the main focus of our lab). Would it be possible to add to the gene database to improve the detection of oligos? We would be happy to suggest additional marker genes. PLP1 may be a good one for example. Also, the detection of cerebellar granule cells (RELN +) was not perfect.
Cool tool, thank you!