farrellja / URD

URD - Reconstruction of Branching Developmental Trajectories
GNU General Public License v3.0
117 stars 41 forks source link

Finding cluster specific marker genes #22

Closed beginner984 closed 5 years ago

beginner984 commented 5 years ago

Hi,

I have 3 clusters of cells, I want to find marker genes specific to each cluster; By markersAUCPR function in URD package I am doing so

> markersAUCPR(object, clust.1 = "1", clust.2 = NULL, cells.1 = NULL,
+              cells.2 = NULL, clustering = "Louvain-60", effect.size = 2,
+              frac.must.express = 0.2, frac.min.diff = 0.2, genes.use = NULL)

I think clust.1 = "1" vs clust.2 = NULL means compare cells in cluster 1 vs all cells in two other clusters and return marker genes specific to cluster 1; By doing so I am getting 9000 marker genes while I only have 11000 genes in my genome. Additionally, when I am plotting the expression of marker genes defined for cluster 1 on URD cluster, I expect only these genes become yellow in cells in cluster 1 but all heat map become yellow (please look at picture).

[

](url) So I suspect if my interpretation of this function for finding marker genes specific to each cluster is right. I read the source code but nothing new I got from that.

Could somebody have a look at this function and tell me if I am wrong?

Thanksa lot

farrellja commented 5 years ago

Hi, beginner

Your code looks fine to me. However, markersAUCPR returns all genes that that pass the frac.must.express and frac.min.diff parameters (in this case, I suppose ~9,000 of 11,000), ranked by how good of a marker they are (AUCPR) in the resultant data.frame. So, you don't want to use all the genes returned necessarily. You can just consider the top ranked genes, or use aucprThreshold to determine what would be the AUCPR value for a random classifier, and take genes that exceed it by some amount (for instance, genes that are 1.5x better at classifying cells between clusters than random.)

beginner984 commented 5 years ago

Excuse me once more,

By below function, I tried to extract marker genes specific to cluster 3 of cells on my URD object;

marker_genes_3=markersAUCPR(objec, clust.1 = "3", clust.2 = NULL, clustering="Louvain-50", effect.size = 1,frac.must.express = 0.1, frac.min.diff = 0, genes.use=object@var.genes)

That gave me 3278 genes; Next, using marker_genes_3[marker_genes_3$AUCPR>0.3,] I selected 278 top genes and I plotted the scaled expression values of them again all my cells (3 clusters) that returned a frustrating heat map. Because I expected these 278 genes only show expression in cluster 3 of cells as they are supposed to be marker genes of cluster 3 while they are showing an sporadic expression

I plotted this heat map by seurat

> DoHeatmap(object = seurat,
genes.use = rownames(marker_genes_3[marker_genes_3$AUCPR>0.3,]),group.by ="URD_cluster_lables)

Could you please give me a clue why these genes are not cluster specific while they expected to be? I have also tried for 2 other clusters that returned similar patterns.

[

](url)