Closed KDeaton closed 2 years ago
Hi @KDeaton
Hi @KDeaton,
To be more precise on the targets used in the article for the 1,520 culturable species in the human gut: we used the addedvalue (i.e. the metabolites producible by the community but not by individual alone) as the targets for the m2m
pipeline (so for this pipeline we have only one list of targets). This allow us to find the key species associated to these targets.
Then we classified these metabolites in 6 different categories (such as lipids or sugar). And we used each of these 6 groups as targets for the m2m_analysis
pipeline to visualize the minimal communities with powergraphs. But we used this because we had a lot of targets in the addedvalue (156 metabolites).
As annotation quality do you refer to the genome annotation or the SBML quality? For the genome annotation, it is quite difficult to estimate the good quality of an annotation especially when dealing with metagenomics data. But it will depend on the tool used for the annotation such as Prokka or eggnog-mapper (Prokka being fast but less accurate than Eggnog-mapper). And there can big some big variations (for example in our article we have variations between genomes associated to 500 reactions to genomes associated to 2,500 reactionss, as you can see in the subfigure b. of this supplementary figure). For the SBML quality, the SBML files created by metage2metabo contain few annotations. And for example, SBML quality check tools (such as memote) might put a very low score to those SBML. Nonetheless, the information they contain is sufficient for m2m.
There is no command to ignore the failed builds with m2m. An easy option if you want to continue the analysis without the failed builds is to remove them from the input folder. You can find the failed build using the resume_inference.tsv
inside the folder m2m_output_folder/pgdb_log
, the failed builds have an ERROR
in their gene_number column. By relaunching m2m, this will uses the successful builds stored in ptools-local folder and creates the corresponding SBML files.
Another way is to keep the failed builds and try to create the PGDB files for the successful builds. There is a possible work-around with mpwt. I have released a new version of mpwt recently (0.7.0) that refactor how mpwt works. With this version each run is independent so if one fails the other will still be process till their ends. So in this case it will produce the PGDB files for m2m.
If you can't update to this version, there is an option with older version of mpwt --ignore-error
that will allow to continue the draft reconstruction even if some build have failed.
In both case, you have to use mpwt command mpwt -f m2m_input_folder -o m2m_output_folder/pgdb --patho --flat --md -v --cpu X
and by adding --ignore-error
if you used the second option. But this will only produce PGDB files for the successful builds and it will not create the SBML files. To go further you need to fix the issue with the failed builds or remove them.
To find why some builds failed you can take a look at the pathologic.log
files located in the input folder. They should contain the errors encountered by Pathway Tools during the inference.
Thanks for both of your responses! I'm all set with questions 1 & 3. For more information on my question 2, when I ran a large metagenome that had a few builds fail, the resume_inference.tsv listed at least 10 in the pwt_warning column. When I ran recon again on a subset of genomes that had successful builds, the process finishes successfully and creates the sbml files, though I didn't get a resume_inference.tsv file. When I check the pathologic.log, there are several warnings. Here are some examples: Warning: The Location "join(1450558..1451127,1..24)" shows a first basepair number that is bigger than the second. This should only happen when crossing the origin. Warning: tRNA IPF37_06710 (NIL) may not have had parsable anticodon information. None assigned. No reaction or class having EC number 5.6.2.c can be found in the MetaCyc DB. Warning: enter-into-lookup-table-internal: Why does acylactivating have 53 associated reactions??
Thanks for the examples I better understand you question now.
These warnings come from Pathway Tools and they can have multiple meanings:
I put a print of these warnings but it is more an informations for the user. Some warnings can need a manual curation (to keep or not the reaction proposed/associated to the gene). For example, in your last example it can be interesting to look at the 53 reactions associated to an enzyme. The issue is when dealing with hundred/thousand of reconstructions we can not have the time to check all of them.
For the fact that mpwt did not produce log at your second run I will look into this to try to find why it failed.
Hi there! I have few process questions: