cole-trapnell-lab / garnett

Automated cell type classification
MIT License
99 stars 24 forks source link

Problem with sci-ATAC-seq analysis #56

Closed nbxszby416 closed 2 years ago

nbxszby416 commented 3 years ago

Hi,

I'm now doing the sci-ATAC-seq part in your paper (Dataset3), and I download the dataset "activity_scores.binarized.rds" and "cell_metadata.txt" from the website of mouse atlas. Also, I download the marker file from your supplement.

However, when I run the function check_markers(), the picture turns out that all the genes are not in db, and it reports: More than 70% of IDs were lost when converting to ENSEMBL IDs. Did you specify the correct gene ID types and the correct db?More than 70% of IDs were lost when converting to ENSEMBL IDs. Did you specify the correct gene ID types and the correct db?

I then run the function train_cell_classifier(), and it reports: genes could not be converted from SYMBOL to ENSEMBL These genes are listed below: The following genes from the cell type definition file are not present in thecell dataset. Please check these genes for errors. Cell type determinationwill continue, ignoring these genes. Cdh1 Slc26a3 Cd3g Cd79a Calb2 Slc1a2 Tek Slc5a1 Myl2 Gata2 Ager Cd34 Aldoc Irx4 Sftpa1 Pebp4 Prm1 Dlx2 Ms4a1 Vil1 Tnp1 Sele Gad2 Lrp2 Slc12a1 Tmco5 Prom2 Cd93 Calb1 Cyp7a1 Alb Npy Grin2b Slco1c1 Siglece Cd19 Ldhc Cdh5 Cdh16 Cd3e Cd3d Zic1 P2ry12 Mag Cd160 Pde3a Kcnj6 Gypa Hbb-b1 Hbb-bs Cx3cr1 Fabp1 Myl3 Zic2 Kdr Cldn19 Slc17a7 Gad1 Cd80 Mog Serpina1c Sh2d1b1 Hist1h2ba and with Error "Not enough training samples for any cell types at root of cell type hierarchy!"

I have checked that the db has all of the genes I'm looking for using the AnnotationDbi functions, such as: select(org.Mm.eg.db,'Ager','ENSEMBL','SYMBOL') 'select()' returned 1:1 mapping between keys and columns SYMBOL ENSEMBL 1 Ager ENSMUSG00000015452

The parameters I use are: db="org.Mm.eg.db" (I also tried db="none" but it doesn't work) cds_gene_id_type = "SYMBOL" marker_file_gene_id_type = "SYMBOL"

I am really appreciated if you could help!

hpliner commented 3 years ago

Hello,

The problem here is that the ATAC data you're referencing has it's gene ids in all caps (i.e. AGER, rather than Ager). Unfortunately this doesn't match what's in org.Mm.eg.db... You can get around this by using db = "none" and changing your marker file to have all caps versions of the genes. Hope this helps!

hpliner commented 2 years ago

I'm going to go ahead an close, reopen if you have further issues