franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
178 stars 39 forks source link

Questions about media, gapfilling, and predicting interactions #139

Closed Qing-microbiol closed 7 months ago

Qing-microbiol commented 9 months ago

Dear Francisco,

Thank you so much for quick response and detailed instructions!

Actually I would like to run carveme, memote, and SMETNAN part of this pipeline and start with carveme. I have moved annotated MAGs *.faa files into protein_bins folder as contig.yaml mentioned as input for carveme. What else should I do to run it? I keep getting errors from my trials and really appreciate your instruction on it.

Regards, Qing

Originally posted by @Qing-microbiol in https://github.com/franciscozorrilla/metaGEM/issues/133#issuecomment-1750130171

franciscozorrilla commented 9 months ago

Hi Qing, I think the question regarding usage from any point in the Snakefile is answered here:

https://github.com/franciscozorrilla/metaGEM/issues/133#issuecomment-1750689755

Regarding your specific errors, could you please provide me with information regarding what commands you are running, and what error messages you are getting?

Qing-microbiol commented 9 months ago

Dear Francisco,

Thanks for your advise on rule edition in snakefile!

Before trying the pipeline, I'm trying to add my specific media into media.tsv file for carveme gapfilling. I'm wondering how could you get corresponding BIGG id of all components of the media from the reference. I found it's difficult to find BIGG id for all components for complex ingredients, such as tryptone. I tried to trace from the reference with media composition but didn't find a document with both component names and BIGG id. Do you have any suggestions to get all the BIGG id faster than adding one by one?

Thanks again for your help! Qing

franciscozorrilla commented 9 months ago

Hi Qing,

Have a look at the data access page in the bigg database, bigg_models_metabolites.txt is probably the file you are looking for, connecting all metabolite IDs with their names + other info. In my experience, when creating media files I usually try to start with some already existing media composition file and tailor it to my needs, for example I can suggest this milk media file or start from one of the existing CarveMe media compositions.

Hope this helps, and I can also suggest to check out the CarveMe github repo issues section in case there are relevant media file/composition discussions there.

Best, Francisco

Qing-microbiol commented 9 months ago

Dear Francisco,

Thanks for your suggestions! Now I have my media composition file ready:)

I have read a lot of closed issues and found more interception about carveme model construction and smetana simulation, then I got confused about how could I achieve my objective after running them. I will appreciate if you could give some instructions.

My goal is to see how will interaction of gut microbiota from different samples change if I add one compound into my base media. For example, my base media was M3 and I added mucin into M3, then I would like to see what's the significant different interactions coming up with mucin addition. In this case, should I construct GEM based on individual MAG or on all the MAGs from one sample? For the growth with M3, I would like to grow human fecal sample, assuming with all MAGs from one sample, but not one single strain or MAG. And should I use M3 with mucin to fill the gaps with carveme? Then use M3 without mucin to do simulation with smetana?

Thank you very much for your patience and help! Qing

franciscozorrilla commented 9 months ago

Hi Qing,

No problem, these are great questions!

In this case, should I construct GEM based on individual MAG or on all the MAGs from one sample?

I have seen people create single models for the entire community by combining individual models. However, in my experience I have always built individual GEMs from MAGs and then simulated them together in their corresponding community with SMETANA. So it is not necessary to combine them all into one model for the entire sample, although the choice is up to you.

And should I use M3 with mucin to fill the gaps with carveme? Then use M3 without mucin to do simulation with smetana?

Indeed, you are on the right track here. However, it will not work for mucin because it is actually not present in the metabolic models or the underlying BiGG database (feel free to check!). Luckily, I have already answered very similar questions before, so I can recommend you read these two discussions in case you haven't seen them yet #91 & #112.

Highlights from #91

Regarding gapfilling and simulation media:

For example, if you gapfill 2 community members on full/complete media and try to simulate metabolic exchanges in the same media then you will get no interactions. This is because CarveMe gapfilled the models so that they can grow purely on the media without needing any additional metabolites. On the other hand, if you gap-fill models on full/complete media and then try to simulate metabolic exhanges in a less-rich-media (e.g. remove certain metabolites from the gapfilling media), then SMETANA will predict the metabolic exchanges that are possible in order to sutain growth of all community members in the given simulation media.

Regarding mucin:

Note that in the media file M8 and M3 have the same exact composition since mucin is not present in the metabolic models generated by CarveMe, as it is not a metabolite in the BiGG database.

Highlights from #112

Regarding gapfilling and simulation media:

This makes sense, and it has to do with the fact that when you gapfill on a given media, you are telling CarveMe that a given model needs to grow under a given media condition, so it will add metabolic reactions that support growth on that minimal media. Therefore, the richer (i.e. more comprehensive) the gapfilling media then the less reactions that will be needed to be added to the model in order to support growth, while a minimal media would require more reactions to be added to the model since there are less metabolic building blocks (i.e. nutrients) to work with.

Similarly, when it comes to simulation media, you should indeed expect to get fewer cross-feeding interactions in the richer (i.e. more comprehensive) compared to the less-rich media. The reason for this is that the interactions represent exchanges where the microbes are unable to obtain a particular necessary nutrient metabolite from the media, and must therefore exchgange it with another member (presumably with a different set of metabolic capacities) in order to grow. For example, if you have a complete media i.e. models can uptake any metabolite that they want, then you would get 0 exchanges between communuity members, because the species would just grow directly from the media. On the other hand, if you have a very minimal media, then this would encourage the community members to interact in order to compensate for each others' auxotrophies.

In fact, if you gapfill models on M8+MEU and then simulate those models in M8+MEU media then I would expect that no interactions are predicted. Again, this has to do with the fact that the models should be growing directly on the media that they were gapfilled for. In your case, I would probably try gapfilling on M8+MEU and then simulating on M8 or MEU.

Some papers to read:

I think some flux balance analysis & genome scale metabolic modelling literature may be helpful for to understand the underlying theory.

Hope this helps! Best wishes, Francisco

Qing-microbiol commented 9 months ago

Dear Francisco,

Thanks a lot for your answers and related highlights! These two highlights were exactly the things that brought me to my question above.

Based on your answer, can I say that if I would like to see the changed interaction after I add a compound (not mucin, but glycerol as an example) into M3, I need to fill gaps with M3+glycerol media and stimulate with M3 media?

I have read more examples from your exercise module and from carveme and smetana website. I'm wondering if I will achieve the same goal by run carveme to get models without gapfilling, then stimulate with smetana with M3 and M3+glycerol, respectively. Then I could compare metabolites or bacteria with high smetana scores from M3 with that from M3+glycerol. How do you think? Will it also work for my purpose?

Thanks again for your help! Qing

franciscozorrilla commented 9 months ago

Hi Qing,

Based on your answer, can I say that if I would like to see the changed interaction after I add a compound (not mucin, but glycerol as an example) into M3, I need to fill gaps with M3+glycerol media and stimulate with M3 media?

Exactly 👍

I'm wondering if I will achieve the same goal by run carveme to get models without gapfilling, then stimulate with smetana with M3 and M3+glycerol, respectively. Then I could compare metabolites or bacteria with high smetana scores from M3 with that from M3+glycerol. How do you think? Will it also work for my purpose?

I think that this is perhaps worth trying out at a small scale as a test example, if it works then great! However, I suspect that without gapfilling the models first, they may not be able to grow in your given media definition, and so there would be no exchanges predicted. Give it a shot and find out! :)

Best, Francisco

Qing-microbiol commented 9 months ago

Hi Francisco,

I have tried to run carveme with gapfilling of my macfarlane media with glycerol. The plan was to simulate the interaction in macfarlane media without glycerol but I got error with gapfilling. My code is here "carve D11/SCHW21-1_D11_MAG_00000003.faa --fbc2 --gapfill macfgly --mediadb macf.tsv -o output_carveme/SCHW21-1_D11_MAG_00000003.xml" The error info is:

"Traceback (most recent call last): File "/home/qing/miniconda3/envs/carveme1/bin/carve", line 8, in sys.exit(main()) File "/home/qing/miniconda3/envs/carveme1/lib/python3.10/site-packages/carveme/cli/carve.py", line 357, in main maincall( File "/home/qing/miniconda3/envs/carveme1/lib/python3.10/site-packages/carveme/cli/carve.py", line 240, in maincall multiGapFill(model, universe_model, media, media_db, scores=scores, max_uptake=max_uptake, inplace=True) File "/home/qing/miniconda3/envs/carveme1/lib/python3.10/site-packages/carveme/reconstruction/gapfilling.py", line 139, in multiGapFill gapFill(model, universe, constraints=constraints, min_growth=min_growth, File "/home/qing/miniconda3/envs/carveme1/lib/python3.10/site-packages/carveme/reconstruction/gapfilling.py", line 74, in gapFill raise RuntimeError('Failed to gapfill model for medium {}'.format(tag)) RuntimeError: Failed to gapfill model for medium macf"

I want to attach my media file with macfarlane compositions but can't add .tsv file here, so I attached excel file. What I did was to save as a .txt file then change the extension name into .tsv. Could you please have a look and see if it's format problem? macf.xlsx

Except for format problem, I was also think it can be the composition of my macfarlane media which provides polysaccharides but not simple sugars for the bacteria because it was designed to grow microbial community from feces but not for single strain. If I construct models for one MAG, it may not able to grow in this media. Do you have any idea about this perspective?

I really appreciate all your help! Qing

franciscozorrilla commented 9 months ago

Hey Qing,

Indeed, it is possible that there are compounds missing from your media composition that would be required for the growth of your given model. As a sanity check, I would try first gapfilling on some standard/rich media that includes simple sugars e.g. LB or milk. If you are still getting those errors then there may be something wrong with your model. You may also find helpful to use the minimal_medium cobrapy function to inform/update your media composition. It is possible that your strain lacks the genes required for complex polysaccharide degradation, or the genes are present but not accurately annotated, or the specific reactions are not present in the models/database in the frist place (e.g. mucin degradation). Good luck!

Best, Francisco

Qing-microbiol commented 9 months ago

Hi Francisco,

Thanks for your suggestions!

I have built models with gapfiled with M3+glycerol media and stimulate with M3 media with the following code: Carveme for all annotated genomes from one sample (community) : M17 is media of M3+glycerol for i in $(ls D11/*.faa);do fid=$(echo $i | cut -d '/' -f2 | sed 's/.faa$//') carve -v --fbc2 --gapfill M17 --mediadb media_db2.tsv -o output_carveme/${fid}.xml $i done smetana: smetana -m M3 --mediadb media_db2.tsv --flavor fbc2 --solver cplex --detailed -o output_smetana -v output_carveme/*.xml I got models constructed from carveme but had the following error for all genomes when running smetana: Loading community: all Running SCS for community all on medium M3... Running MUS for community all on medium M3... /home/qing/miniconda3/envs/carveme1/lib/python3.10/site-packages/smetana/smetana.py:160: UserWarning: MUS: Failed to find a minimal growth medium for SCHW21-1_D11_MAG_00000003 warn('MUS: Failed to find a minimal growth medium for ' + org_id) /home/qing/miniconda3/envs/carveme1/lib/python3.10/site-packages/smetana/smetana.py:160: UserWarning: MUS: Failed to find a minimal growth medium for SCHW21-1_D11_MAG_00000004 warn('MUS: Failed to find a minimal growth medium for ' + org_id) /home/qing/miniconda3/envs/carveme1/lib/python3.10/site-packages/smetana/smetana.py:160: UserWarning: MUS: Failed to find a minimal growth medium for SCHW21-1_D11_MAG_00000005 Do you have any idea how to fix this error? Thanks again for your help!

Regards, Qing

franciscozorrilla commented 9 months ago

Seems like your model is still having trouble growing in that media. I suggest you look into the specific media requirements of your model, as I explain to another user in this comment https://github.com/franciscozorrilla/metaGEM/issues/111#issuecomment-1313787233 . Also, I would try using cobrapy to check that the models indeed have valid FBA solutions under the media compositions being used with SMETANA.