Adding another protein function annotation tool

cdanielmachado / carveme

CarveMe: genome-scale metabolic model reconstruction

Other

145 stars 49 forks source link

Adding another protein function annotation tool #138

Open PedroMTQ opened 2 years ago

PedroMTQ commented 2 years ago

Hello,

We have recently published a protein function annotation tool (i.e., Mantis) and are now interested in building metabolic networks using the annotations provided from this tool as a seed for the network generation. I see that you already include eggNOG-mapper, do you think it would be possible to include our tool? If not officially, could you please advise on the best output format so that it can be directly fed to CarveMe? Also, do you use an internal knowledge base for matching annotations to the universal network? I suppose having different ontologies would be a constraint. Anyhow, the tool we've developed uses a large range of ontologies so it shouldn't too much of a hassle.

Regards, Pedro

cdanielmachado commented 2 years ago

Hi Pedro,

CarveMe matches input sequences to an internal protein database that connects BiGG genes to BiGG reactions.

The integration with eggnog-mapper required adding BiGG gene identifiers as an additional column in the eggnog-mapper annotations output file. This was done by the main developer of eggnog-mapper.

You can do the same from your end, if you would like to. From BiGG you can get the BiGG gene ids and respective DNA and AA sequences: http://bigg.ucsd.edu/models/e_coli_core/genes/b0008

But to be honest, this is not the best solution on the long term. I hope in the future to enable more conventional annotations like EC numbers or GO terms.

PedroMTQ commented 2 years ago

Hello Daniel,

Sorry for the delay in getting back to you. I was looking into this now and was wondering how to do the homology search. My initial idea was to cluster by BIGG reactions using multiple sequence alignment (MSA) of the associated BIGG genes and then creating HMMs of this MSA. Alternatively I could just create a BIGG genes diamond database and feed it to my protein function annotation tool (i.e., Mantis). Since Mantis is flexible in which databases to use as a reference, both ways would be viable. Anyway, please let me know as well which format would be preferable, this is the current output format: https://github.com/PedroMTQ/mantis/wiki/Output But I could implement something else so that it is compatible with CarveMe.

Regards, Pedro

cdanielmachado commented 2 years ago

Hi Pedro,

Sorry for the late reply as well. By default CarveMe is expecting a TSV file generated with diamond. If you generate the columns in the same order as the diamond default it should work.

Otherwise it would probably not be too hard for you to fork the code and add another flag (--mantis) to the command line interface to read your file format instead.