Bioconductor / AnnotationForge

Tools for building SQLite-based annotation data packages
https://bioconductor.org/packages/AnnotationForge
4 stars 9 forks source link

Error in FUN(X[[i]], ...) : data.frames in '...' cannot contain duplicated rows #28

Closed najibveto closed 2 years ago

najibveto commented 2 years ago

hello, I am working on non-model organism. so i tried the make the organism package through use of function makeOrgPackage as fellow: first i annotated the different genes of my organism using eggnogmapper, then i loaded the generated table into rstudio and used the following code: rm(list = ls()) options(stringsAsFactors = F) library(tidyverse) library(clusterProfiler) library(AnnotationHub) library(AnnotationForge) egg <- rio::import('fatheadminnow-annotation.tsv') egg[egg==""] <- NA colnames(egg) gene_info <- egg %>% dplyr::select(GID = query_name, GENENAME = seed_ortholog) %>% na.omit() gterms <- egg %>% dplyr::select(query_name, GOs) %>% na.omit() gterms<- gterms[!grepl("-", gterms$GOs),] library(stringr) all_go_list=str_split(gterms$GOs,",") gene2go <- data.frame(GID = rep(gterms$query_name, times = sapply(all_go_list, length)), GO = unlist(all_go_list), EVIDENCE = "IEA") gene2go<- gene2go[!grepl("-", gene2go$GO),] gene2ko <- egg %>% dplyr::select(GID = query_name, KO = KEGG_ko) %>% na.omit() load("kegg_info.RData") colnames(ko2pathway)=c("KO",'Pathway') library(stringr) gene2ko$KO=str_replace(gene2ko$KO,"ko:","") gene2ko<- gene2ko[!grepl("-", gene2ko$KO),] gene2pathway <- gene2ko %>% left_join(ko2pathway, by = "KO") %>% dplyr::select(GID, Pathway) %>% na.omit() makeOrgPackage(gene_info=gene_info, go=gene2go, ko=gene2ko, maintainer='gmail.com>', author='gmail.com>', pathway=gene2pathway, version="0.0.1", outputDir = "C:/Users/Documents", tax_id=90988, genus="Pimephales", species="promelas", goTable="go") and i got the following error: Error in FUN(X[[i]], ...) : data.frames in '...' cannot contain duplicated rows

i already used the package before and used the same code for another specie and it worked fine.

lshep commented 2 years ago

Did you check the objects created and put as input to the makeOrgPackage function? In general duplicate rows in R data frames are not allowed. Perhaps it worked correctly for a different species because that species did not have duplicate rows?

najibveto commented 2 years ago

thank for your reply. I check the table gene2go and gene2ko and I didn't find duplicate, as you can see here: 1 2 previously, I used the package for making the database and was in similar form: 3

and it worked fine. instead of using gene name, I used the transcript name in the GID column.

nturaga commented 2 years ago

hi @najibveto

Please format your code as follows using the markdown formatting in Github issue . It is not easy to follow your question.

vjcitn commented 2 years ago

And you'll have to provide fatheadminnow-annotation.tsv for us to reproduce

najibveto commented 2 years ago

sorry for my late reply. for the problem, I used the transcript id instead of the gene name and it works fine now. thank you for your help.