Bioconductor / AnnotationForge

Tools for building SQLite-based annotation data packages
https://bioconductor.org/packages/AnnotationForge
4 stars 9 forks source link

No gene can be mapped #34

Open shiyi-pan opened 2 years ago

shiyi-pan commented 2 years ago

Hi, I want to use AnnotationForge to create an database for GO annotation, the creation process works well ,but when I use it to do Go annotation analysis, it always show me the following message:No gene can be mapped. But in original files , my test gene set all have GO terms. Could you give me some advise? Thank you very much. Here is my creation script: ''' emapper <- read_delim(file = "MM_s2sos6ca.emapper.annotations.txt", delim = "\t", escape_double = FALSE, col_names = TRUE, comment = "#", trim_ws = TRUE) %>% dplyr::select( GID = query, Gene_Symbol = Preferred_name,OG = eggNOG_OGs, GO = GOs, KO = KEGG_ko, Pathway = KEGG_Pathway, Gene_Name = seed_ortholog)

gene_info <- dplyr::select(emapper, GID, Gene_Name) %>% dplyr::filter(!is.na(Gene_Name))

gene2go <- dplyr::select(emapper, GID, GO) %>% separate_rows(GO, sep = ",",convert = F) %>% filter(!is.na(GO)) %>% mutate(EVIDENCE = "IEA")

AnnotationForge::makeOrgPackage(gene_info = gene_info, go = gene2go, version="1.0", maintainer = "1343213784@qq.com", author = "panshiyi", tax_id = "3847", genus = "G", species = "m", outputDir=getwd(),goTable="go")

''' Here is the R log:

Populating genes table: genes table filled Populating gene_info table: gene_info table filled Populating go table: go table filled table metadata filled 'select()' returned many:1 mapping between keys and columns Dropping GO IDs that are too new for the current GO.db Populating go table: go table filled Populating go_bp table: go_bp table filled Populating go_cc table: go_cc table filled Populating go_mf table: go_mf table filled 'select()' returned many:1 mapping between keys and columns Populating go_bp_all table: go_bp_all table filled Populating go_cc_all table: go_cc_all table filled Populating go_mf_all table: go_mf_all table filled Populating go_all table: go_all table filled Creating package in D:/NN1138-2GO注释R包/org.Gm.eg.db Now deleting temporary database file [1] "D:/NN1138-2GO注释R包/org.Gm.eg.db" There were 50 or more warnings (use warnings() to see the first 50)

Here is my go analysis script: ''' gene <- read.table(file = "test_degs.txt",header = T)

de_ego <- enrichGO(gene = gene, OrgDb = org.Gm.eg.db, keyType = "GID", ont = "BP", qvalueCutoff = 0.05, pvalueCutoff = 0.05)

Here is the R log:

--> No gene can be mapped.... --> Expected input gene ID: NN06g00399.1,NN13g00022.1,NN17g00152.1,NN15g02698.1,NN10g01422.1,NN11g01444.1 --> return NULL... Warning message: call dbDisconnect() when finished working with a connection

phoebee-h commented 2 years ago

Hi, I am just a user, not from the package development group of clusterProfiler OR AnnotationForge. I would kindly suggest you to check your GID and your input "test_degs.txt" first. It doesn't seem to be an error when creating your OrgDB. Obviously, the error indicated that the orgdb keyType "GID" doesn't match the genes in your DEG input.

shiyi-pan commented 2 years ago

Thank you for your reply. I don't know why the orgdb keyType "GID" doesn't match the genes in my DEG input. here is my gene2go file:

GID GO EVIDENCE 1 NN01g00003.1 GO:0005575 IEA
2 NN01g00003.1 GO:0016020 IEA
3 NN01g00010.1 GO:0003674 IEA
4 NN01g00010.1 GO:0003824 IEA
5 NN01g00010.1 GO:0005575 IEA
6 NN01g00010.1 GO:0005622 IEA

here is my gene info file:

GID Gene_Name
1 NN01g00002.1 3847.GLYMA01G00321.2 2 NN01g00003.1 3847.GLYMA14G00270.1 3 NN01g00005.1 3847.GLYMA08G47480.4 4 NN01g00006.1 3760.EMJ24469
5 NN01g00007.1 3847.GLYMA01G00400.5 6 NN01g00008.1 3847.GLYMA01G00410.3

here is my input DEGs, they all have more than one GO terms:

NN06g00399.1 NN13g00022.1 NN17g00152.1 NN15g02698.1 NN10g01422.1 NN11g01444.1

could you give me some advise ? Thank you again.