dgrun / StemID

Algorithm for the inference of cell types and lineage trees from single-cell RNA-seq data.
39 stars 19 forks source link

Wrong number of clusters and extraction barcodes #8

Open sbarvaux opened 2 months ago

sbarvaux commented 2 months ago

Hi,

I am trying to apply the RaceID/StemID pipeline to my scRNA seq dataset, however, even though I am setting the number of clusters to a specific number with this line sc <- clustexp(sc,cln=10,sat=FALSE), I systematically end up with a higher number of clusters at the end. How can I manage this ?

Also, I am initially working with a Seurat object, ultimately, I would like to extract the barcodes that show the highest score in StemID and see to which cluster it matches in my Seurat Object.

With "combined" beeing my Seurat object, here is the script used :

`combined_counts <- as.matrix(GetAssayData(combined, slot = "counts")) combined_meta <- combined@meta.data

n<-colnames(combined_counts) b<-list(n[grep("^CON89",n)],n[grep("^CON90",n)])

Create SCseq object for RaceID + batch effect correction

sc <- SCseq(combined_counts) sc <- filterdata(sc, LBatch=b, bmode="RaceID",mintotal = 1000) # Adjust 'mintotal' based on your data sc <- compdist(sc, metric = "pearson") sc <- clustexp(sc)

sc <- clustexp(sc,cln=10,sat=FALSE)

sc <- findoutliers(sc)

plotbackground(sc) plotsensitivity(sc)

plotoutlierprobs(sc)

clustheatmap(sc)

Run t-SNE

sc <- comptsne(sc) sc <- compumap(sc) saveRDS(sc, file="sc_object_final_before_StemID.rds")

Run RaceID and StemID analysis

stem <- Ltree(sc) stem <- compentropy(stem) stem <- projcells(stem, cthr = 5, nmode = FALSE) stem <- projback(stem, pdishuf = 100) stem <- lineagegraph(stem)

stem <- comppvalue(stem, pthr = 0.05)

Identify stem cell clusters

stemID_scores<- compscore(stem)

dgrun commented 2 months ago

The reason you get more clusters is the outlier identification step. If you would like to avoid outlier clusters, then set the probability threshold for outlier detection to zero:

sc <- findoutliers(sc,probthr=0)

sbarvaux commented 2 months ago

Thank you for the proposition, I will try this in my next run. I tried also to apply your code from this : https://github.com/dgrun/StemID/issues/7 to directly export my seurat clusters, however, I encounter this error : with the projecells function

Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

Is there a way to fix this issue ?

Thanks in advance

dgrun commented 2 months ago

I would need sample code with data to check the probelm...

sbarvaux commented 2 months ago

I am sending you an email now to your email address dominic.gruen@uni-wuerzburg.de with a subset of my data

Thanks !

dgrun commented 1 month ago

Hi, your code is fine, but the problem is that your cluster numbers for Seurat are incomplete:

part <- as.numeric(combined@meta.data$seurat_clusters) sort(unique(part)) [1] 1 2 3 5 6 7 8 10 11

You are lacking clusters 4 and 9. RaceID/StemID expect all numbers between 1 and 11. Maybe you could rename clusters from 1 to 9.