cole-trapnell-lab / garnett

Automated cell type classification
MIT License
106 stars 25 forks source link

Checking my markers for upcoming "Training my own classifier" steps #62

Closed techgirl2022 closed 2 years ago

techgirl2022 commented 2 years ago

I have a cds object file for which I want to do single cell RNA seq analysis (make UMAP and identify cell types) on using monocle and garnett software, I was trying to check what marker genes are in my cds file.

cds1 <- readRDS('example_cds.RDS') head(colData(cds1)) #shows annotations on each column as: cell (character), size factor (numeric), n.umi (numeric), perc_mitochondrial_umis (numeric), scrublet_score (numeric), scrublet_call (character), num_genes_expressed (integer) head(rowData(cds1)) #shows annotations on each row as: gene_short_name (character), id (character), chromosome (character), bp1 (integer), bp2 (integer), gene_strand (character), num_cells_expressed (integer)

library(org.Mm.eg.db) marker_file_path <- "C:/Users/[my username here]/Downloads/kidney_marker_genes.txt" marker_check <- check_markers(cds1, marker_file_path, db=org.Mm.eg.db, cds_gene_id_type = "SYMBOL", marker_file_gene_id_type = "SYMBOL")

plot_markers(marker_check)

However, I'm getting an error (see attached screenshot) check_markers_error

Any suggestions on what I should do to troubleshoot this step?

hpliner commented 2 years ago

Hello, This error usually means that there's a mismatch between the format of the genes in the database versus in your cds object. You're using the Mm database, so just to check, your cds has genes (in the row.names of rowData) in standard mouse symbol format (e.g. Cd4)? And your marker file as well?

techgirl2022 commented 2 years ago

Yes, I used the code head(rowData(cds)) to check the format of my genes in my cds object. It shows gene ID (ENSMUSG followed by 11 digits) and then gene_short_name in standard mouse symbol format (e.g. Gnai3), and for my marker file (which I manually made myself) I have it as:

kidney_marker_genes.txt

Podocytes expressed: Nphs1, Nphs2, Synpo, Cdkn1c, Wt1, Cd2ap, Podxl

and the list continues for the rest of the cell types

clee700 commented 2 years ago

Hello, sorry to butt in, but I think you should have your cell type started with >. Maybe it thinks podocytes is a gene and is trying to convert that to a gene name?

hpliner commented 2 years ago

Hi, sorry for the late response. If you're using ensembl id for the cds object, you need to set cds_gene_id_type = "ENSEMBL" instead of SYMBOL. Reopen if this doesn't solve