Add annotations for KEGG pathways and genes

draeger-lab / ModelPolisher

ModelPolisher accesses the BiGG Models knowledgebase to annotate SBML models.

MIT License

23 stars 7 forks source link

Add annotations for KEGG pathways and genes #107

Open NantiaL opened 2 years ago

NantiaL commented 2 years ago

It would be very useful to have annotations for KEGG subsystems and genes assigned by the ModelPolisher. This would save a lot of time and work during model curation.

To obtain genes' annotations one of the following methods could be used:

the GenBank file combined with old/new locus tags
the NCBI API
the KEGG API and the conv operation

matthiaskoenig commented 2 years ago

Just to comment on this. KEGG is basically dead for many researchers since moving behind a non-open license. This makes it basically impossible to work with KEGG data and annotations. Many researchers dropped KEGG in recent years (as did I). I would not recommend to put any effort in supporting KEGG, but instead use open alternatives such as reactome.

Schmoho commented 1 month ago

I am not quite sure I understand this correctly. @NantiaL could you maybe provide an example of what you are asking for?

NantiaL commented 1 month ago

With this I mean a simple annotation of genes within the model. Given a RefSeq annotation file with multiple entries like gene, CDS etc.:

NC_XXX.1    RefSeq  gene    12508   13482   .   +   .   ID=gene9;Dbxref=GeneID:4917798;Name=XXX;gbkey=Gene;gene_biotype=protein_coding;locus_tag=XXX
NC_XXX.1    RefSeq  CDS 12508   13482   .   +   0   ID=cds9;Parent=gene9;Dbxref=Genbank:YP_XXX.1,GeneID:XXX;Name=YP_XXX.1;gbkey=CDS;product=XXX;protein_id=YP_XXX.1;transl_table=11

extract information e.g., locus tag, gene ID, and name, and add them as annotations (CV terms) in the model. Similar information can be extracted also from the GenBank annotation file.

This would require to introduce a new input parameter, meaning the user will need to provide the .gff file while executing ModelPolisher.

GwennyGit commented 1 month ago

We already implemented something similar in refineGEMs. 🤔 See function cv_ncbiprotein

The section Functions to add additional URIs to GeneProducts in the same module also contains functions to add more annotations to the GeneProducts.