bodegalab / irescue

Uncertainty-aware quantification of Transposable Elements expression in scRNA-seq
MIT License
14 stars 2 forks source link

Confued number of clusters by TE matrix #3

Closed xiangyupan closed 9 months ago

xiangyupan commented 2 years ago

Hi beboli, It is still me. After I successfully ran the irescue and got the three files (matrix.mtx.gz,features.tsv.gz and barcodes.tsv.gz) of each time point. I ran the command to add the TE assay into the RNA assay. dpa0.data <- Read10X(data.dir = "/public1/home/sc60481/Axolotl/sc-RNA/dpa0/outs/filtered_feature_bc_matrix")
dpa0 <- CreateSeuratObject(counts = dpa0.data, project = "dpa0", min.cells = 3, min.features = 100) dpa0.te.data <- Seurat::Read10X('./dpa0/outs/IRescue_out/', gene.column = 1, cell.column = 1)
te.assay <- Seurat::CreateAssayObject(dpa0.te.data)
te.assay <- subset(te.assay, colnames(te.assay)[which(colnames(te.assay) %in% colnames(dpa0))])
dpa0[['TE']] <- te.assay

As the scRNA-seq data has been analyzed and intergrated with annotations of celltype info before I ran irescue, I found that the TE assay of each stage can not be added to the previous seurat object. Then I re-ran each stage follow aforementioned commands and merged all my seven stages by Harmony and ran the normalization, scale and findcluster analysis based on this object. 图片 As the species I used has 48 subfamilies of TE, the the TE matrix is 48 subfamilies × N cell. 图片

Am I right? I can not understand this TE matrix for why not the matrix is each TE × N cell. The second confusion of mine is when I ran FindClusters with resolution <1.0, I can only get 3 clusters, while resolution >1.0 (I have try 1.0001),the number of clusters increased to ~9000. I think I must make something errors. Hope you can help me. Thank you very much. Xiangyu

bepoli commented 2 years ago

Hi,

As the scRNA-seq data has been analyzed and intergrated with annotations of celltype info before I ran irescue, I found that the TE assay of each stage can not be added to the previous seurat object.

It's not clear to me what the problem is. Is that you cannot add an assay with TE counts to an existing seurat object? If yes, do you get some kind of error?

Am I right? I can not understand this TE matrix for why not the matrix is each TE × N cell.

It looks to me that you have TE subfamilies in rownames and cell barcodes in colnames, as it should be. Can you explain me what is that you do not understand about this matrix?

The second confusion of mine is when I ran FindClusters with resolution <1.0, I can only get 3 clusters, while resolution >1.0 (I have try 1.0001),the number of clusters increased to ~9000.

I know that FindClusters increases the number of clusters quite a lot for resolution values greater than 1.0. However, 9000 seems too much, and unfortunately I never did encounter such issue... maybe the number of TE features in this species is too low for the nearest neighbour algorithm to work properly, but you could try changing clustering method or playing with other parameters. By the way, did you normalized and scaled the counts?

If you cannot find a way to find clusters based on those 48 TEs, you could use the clusters that you found by gene expression analysis and see if you can find TEs that are enriched in these clusters.

xiangyupan commented 2 years ago

Sorry for not being clear. I have an intergrated object from seven stages before I ran the irescue for each stage. My confusion is that how to add each stage TE assay to an intergrated seurat object rds file as follows. It seems that the te.assay of each stage can only be added to its RNA assay of correspondence stage but not the intergrated assay. 图片

For the TE matrix, as we input the TE bed file containing millions of TEs for irescue, then I thought we would get their expression matrix of each TE within each cell. But actually we got the expression of TE subfamilies within each cell barcode. As you said, if it is right that I have TE subfamilies in rownames and cell barcodes in colnames, how can we expolore which set of transposable elements are cell type-specific.

Thanks for your work.

bepoli commented 2 years ago

ok, the problem is that all the assays in a Seurat object need to have the same cells. This is why you subset the te.assay before you add it to the dpa0 object. If you want to add te.assay to another object, you still need to have the same cells. So if te.assay is a superset of obj, you have to subset it. Or, if obj is a superset of te.assay, you need to subset obj (or, if you do not want to subset, you can add the missing cells to te.assay with zero TE counts, as you prefer).

how can we expolore which set of transposable elements are cell type-specific

If you resolved the cell types in you gene expression assay (so you have N clusters based on gene expression), you can easily visualize and analyse the expression of TEs in these N clusters. To find the specific TEs, you can use the Seurat's FindMarkers or FindAllMarkers functions, or other packages for single-cell differential expression. You just need to set the default assay to "TE" and the default identities to the N clusters in which you need to find the specific TEs. For visualization, use the Seurat functions FeaturePlot, DotPlot, or whatever you find suitable.