Bioconductor / AnnotationForge

Tools for building SQLite-based annotation data packages
https://bioconductor.org/packages/AnnotationForge
4 stars 9 forks source link

goTable' GO Ids must be formatted like 'GO:XXXXXXX' #18

Closed najibveto closed 3 years ago

najibveto commented 3 years ago

hello, I am trying to build up my organism package using the follow code:

rm(list = ls())
options(stringsAsFactors = F)
library(tidyverse)
library(clusterProfiler)
library(AnnotationHub)
library(AnnotationForge)
setwd("D:/New folder/najib/Project05")
egg <- rio::import('N402-annotation.tsv')
egg[egg==""] <- NA 
colnames(egg)
gene_info <- egg %>% dplyr::select(GID = query_name, GENENAME = seed_ortholog) %>% na.omit()
gterms <- egg %>%
  dplyr::select(query_name, GOs) %>% na.omit()
library(stringr)
all_go_list=str_split(gterms$GOs,",")
gene2go <- data.frame(GID = rep(gterms$query_name,
                                times = sapply(all_go_list, length)),
                      GO = unlist(all_go_list),
                      EVIDENCE = "IEA")
gene2go=gene2go[-1,]
gene2ko <- egg %>%
  dplyr::select(GID = query_name, KO = KEGG_ko) %>%
  na.omit()
load("kegg_info.RData")
colnames(ko2pathway)=c("KO",'Pathway')
library(stringr)
gene2ko$KO=str_replace(gene2ko$KO,"ko:","")
gene2pathway <- gene2ko %>% left_join(ko2pathway, by = "KO") %>% 
  dplyr::select(GID, Pathway) %>%
  na.omit()
makeOrgPackage(gene_info=gene_info,
               go=gene2go,
               ko=gene2ko,
               maintainer='najib <najibveto@gmail.com>',
               author='najib <najibveto@gmail.com>',
               pathway=gene2pathway,
               version="0.0.1",
               outputDir = "D:/New folder/najib/Project05",
               tax_id=5061,
               genus="Aspergillus",
               species="nigerN402",
               goTable="go")

however i got the following error:

Populating genes table:
genes table filled
Populating gene_info table:
gene_info table filled
Populating go table:
go table filled
Populating ko table:
ko table filled
Populating pathway table:
pathway table filled
table metadata filled
Error in makeOrgDbFromDataFrames(data, tax_id, genus, species, dbFileName,  : 
  'goTable' GO Ids must be formatted like 'GO:XXXXXXX'

I checked the gene2go table and the GO: image what is the possible solution for this issue? thank you for your help.

najibveto commented 3 years ago

I found the error causing the problem . it was some row in GO column contains "-" character.

yufu0110 commented 1 year ago

how met the same problem, how could i solve it.

jmacdon commented 1 year ago

@yufu0110 This isn't the place to ask for help with simple R text manipulations. In the future, please us r-help@r-project.org.

For your question, please see ?gsub