franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
187 stars 40 forks source link

Regarding usage, installing CPLEX & CarveMe #12

Closed ShailNair closed 3 years ago

ShailNair commented 3 years ago

Hi, The package seems to be interesting, and would like to give a shot on my data. I have a sweet of prokaryotic MAGs and would like to run the GEM pipeline on it without starting from assembly and MAG generation. Is it possible to do so?

franciscozorrilla commented 3 years ago

Hi Shail,

Absolutely! Since metaGEM is implemented as a series of Snakemake rules, it is possible to use the pipeline in a modular fashion. You can simply begin with the bash metaGEM.sh --task carveme command, which will use the CarveMe tool for generating your metabolic models.

Please note that you will need to obtain a free academic license and manually install the CPLEX solver to use CarveMe. You can use CarveMe as a standalone tool or through the metaGEM pipeline. If you run CarveMe through metaGEM, note that the Snakefile will identify your MAGs by searching for files with extension .faa under the protein_bins subfolder. Also note that CarveMe takes as input ORF annotated protein fasta files, so you may need to run prodigal or similar software if you are starting from DNA MAGs.

Finally, if your MAGs come from short-read data and you have the time/resources, I would encourage you to try using metaGEM for MAG reconstruction to see if it yields better quality MAGs. You may also find this tutorial useful to see how metaGEM can be used. Let me know if you have any further questions and thank you for your interest in the pipeline!

Best wishes, Francisco

ShailNair commented 3 years ago

Thanks. i will try to set it up and run

ShailNair commented 3 years ago

Hi, unfortunately, my request to obtain a free academic license was rejected (even though my the university is eligible for the program). I tried with another colleague and same response for him also.

franciscozorrilla commented 3 years ago

Hi Shail, sorry to hear that. I think the link I had in the README file was broken, is this the portal where you tried to get the license from? Did they give a reason for rejecting your request?

ShailNair commented 3 years ago

Yes. i saw the previous link was not correct. I used this link. Unfortunately, it seems my university is not a registered member to avail IBM academic initiative program. Is there any other way, because to register is a long process and has to go through university officials.

franciscozorrilla commented 3 years ago

That is a shame, I hope your university can register successfully. If you have access to a high performance computer cluster you could check if CPLEX is installed and loadable as a module, or you could ask your cluster support people if they are able to install CPLEX on the cluster for you. Alternatively you could try using the gurobi solver instead of CPLEX, as discussed in this issue from the CarveMe repository, although I have never tried using gurobi with CarveMe and it seems like it may not give you the same results.

ShailNair commented 3 years ago

Thanks. unfortunately, CPLEX isnt installed on my lab server. I did manage to get gurobi solver license. So if am not wrong, I followed the manual method: i first create conda enve and install metaGEM already have metawrap (with CheckM) create and install prokka-roary

Activate metagem env. and install gurobi solver.

Is that correct?

franciscozorrilla commented 3 years ago

Glad you could get the gurobi solver. Your installation procedure sounds correct to me, although you do not need the prokka-roary environment to run CarveMe. Try activating the metagem environment and run the command carve to check if CarveMe is installed successfully able to locate/use the gurobi solver. If you dont get any errors then you can try running CarveMe on a test MAG to see if it generates a model successfully:

carve -g M3 -v --mediadb /path/to/media_db.tsv --fbc2 -o test_MAG_output.xml test_MAG_input.faa

If you cloned the github repo then the media_db file should be under the Scripts folder. M3 is complete media, but you can use any media in the database or even create your own custom media based on BiGG metabolite IDs.

ShailNair commented 3 years ago

CarveMe cannot locate the gurobi solver and throws error saying Solver cplex not available

carve -g M3 -v --mediadb /home/mcs/soft/metaGEM/Scripts/media_db.tsv --fbc2 -o /home/mcs/miniconda3/envs/metagem/test_MAG_output.xml /home/mcs/miniconda3/envs/metagem/SN_MAG_00001-contigs.faa Traceback (most recent call last): File "/home/mcs/.local/bin/carve", line 5, in from carveme.cli.carve import main File "/home/mcs/.local/lib/python3.8/site-packages/carveme/init.py", line 14, in set_default_solver(config.get('solver', 'default_solver')) File "/home/mcs/.local/lib/python3.8/site-packages/reframed/solvers/init.py", line 60, in set_default_solver raise RuntimeError(f"Solver {solvername} not available.") RuntimeError: Solver cplex not available.

ShailNair commented 3 years ago

It worked after changing the default solver to gurobi in CarveME config.cfg file. the output is a .tsv file and an XML file. How to visualize this data into a graphical model.

SN_MAG_00001-contigs.tsv.txt test_MAG_output.xml.txt

what is --fbc2 in the above carve command? Is Species metabolic coupling analysis with SMETANA which is mentioned as [in progress] ready to be used?

Similarly, how to build a custom media_db.tsv file (for say I want to construct a model outputting ammonia production and the associated organisms within the metagenome)

franciscozorrilla commented 3 years ago

Good to hear that you are able to get CarveMe running. The tsv file you can ignore as it is an intermediate file, your output model is the .xml file.

How to visualize this data into a graphical model.

Since metaGEM is designed to reconstruct thousands of genome scale metabolic models from metagenomes I did not include any visualization of individual models. Instead, after generating many models you can run the bash metaGEM.sh -t modelVis command to get an overall idea of the distribution of genes, reactions, and metabolites (see unseenbio demo section 7). Beware that this command works under the assumption that your models are labeled according to the following scheme: {sampleID}_{binNumber}.xml. You could also create a summary file with number of metabolites, number of reactions, and number of genes by running the following bash code (taken from modelVis Snakefile rule, see line 1181) from within the folder where your models are stored and assuming that they end with the .xml extension:

while read model;do
    id=$(echo $(basename $model)|sed 's/.xml//g');
    mets=$(less $model| grep "species id="|cut -d ' ' -f 8|sed 's/..$//g'|sort|uniq|wc -l);
    rxns=$(less $model|grep -c 'reaction id=');
    genes=$(less $model|grep 'fbc:geneProduct fbc:id='|grep -vic spontaneous);
    echo "Model: $id has $mets mets, $rxns reactions, and $genes genes ... "
    echo "$id $mets $rxns $genes" >> GEMs.stats;
done< <(find . -name "*.xml")

This should generate the GEMs.stats file, which you can visualize yourself or using the modelVis.R script.

For quick individual model visualization I can recommend the fluxer webtool. You may also find the memote webtool useful for quick inspection of individual models.

what is --fbc2 in the above carve command?

As described in the documentation, the --fbc2 flag tells CarveMe to output the model in sbml fbc2 (flux balance constraint v2) instead of using the default old COBRA format.

Is Species metabolic coupling analysis with SMETANA which is mentioned as [in progress] ready to be used?

Yes, SMETANA is already implemented into the metaGEM workflow, the [in progress] refers only to the unseen bio demonstration results which I plan to finish soon. Beware that SMETANA also uses the CPLEX solver by default, but you may be able to use the gurobi sovler by setting the --solver flag as described in the documentation.

how to build a custom media_db.tsv file

I have not tried creating my own media recipes yet, and this is something you will need to do manually. For example if you want to create a new medium called M42 with metabolites leucine, glucose, and tryptophan then you would add to the end of the media_db.tsv file the following lines:

M42 M42 leu__L  leu__L
M42 M42 glc__D  glc__D
M42 M42 trp__L  trp__L
ShailNair commented 3 years ago

Thanks. I tried a single model with fluxer and it works nicely. will try modelvis once I run the pipeline on all MAGs.

You said the M3 is a complete media, does it mean that it encompasses all the metabolic reactions of prokaryotes? What is the cut off for assigning a metabolite to a protein. I see in the intermediate TSV file different protein are assigned to a metabolite/reaction at different cut-off.

franciscozorrilla commented 3 years ago

You said the M3 is a complete media, does it mean that it encompasses all the metabolic reactions of prokaryotes?

I am not sure what you mean by encompassing all metabolic reactions of prokaryotes. If you mean that this media should support the growth of prokaryotes then yes, although there are of course prokaryotes that are not presently cultivable due to unknown growth conditions/physiology. You can look into the media_db.tsv file and search for M3 to see which metabolites are present, and to get an overview of what is in M3 you can also look at Fig. 1 panel D of this paper:

image

What is the cut off for assigning a metabolite to a protein?

Since I am not intimately familiar with the implementation details I encourage you to look at the methods/supplementary sections of the CarveMe paper.

Sorry for the late response, hope it helps!

franciscozorrilla commented 3 years ago

Closing this due to lack of activity but please re-open if you have any follow up questions. Best, Francisco