Open NantiaL opened 2 years ago
Just to comment on this. KEGG is basically dead for many researchers since moving behind a non-open license. This makes it basically impossible to work with KEGG data and annotations. Many researchers dropped KEGG in recent years (as did I). I would not recommend to put any effort in supporting KEGG, but instead use open alternatives such as reactome.
I am not quite sure I understand this correctly. @NantiaL could you maybe provide an example of what you are asking for?
With this I mean a simple annotation of genes within the model. Given a RefSeq annotation file with multiple entries like gene, CDS etc.:
NC_XXX.1 RefSeq gene 12508 13482 . + . ID=gene9;Dbxref=GeneID:4917798;Name=XXX;gbkey=Gene;gene_biotype=protein_coding;locus_tag=XXX
NC_XXX.1 RefSeq CDS 12508 13482 . + 0 ID=cds9;Parent=gene9;Dbxref=Genbank:YP_XXX.1,GeneID:XXX;Name=YP_XXX.1;gbkey=CDS;product=XXX;protein_id=YP_XXX.1;transl_table=11
extract information e.g., locus tag, gene ID, and name, and add them as annotations (CV terms) in the model. Similar information can be extracted also from the GenBank annotation file.
This would require to introduce a new input parameter, meaning the user will need to provide the .gff file while executing ModelPolisher.
We already implemented something similar in refineGEMs. 🤔
See function cv_ncbiprotein
The section Functions to add additional URIs to GeneProducts
in the same module also contains functions to add more annotations to the GeneProducts.
It would be very useful to have annotations for KEGG subsystems and genes assigned by the ModelPolisher. This would save a lot of time and work during model curation.
To obtain genes' annotations one of the following methods could be used:
conv
operation