franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
203 stars 42 forks source link

SMETANA error: smetana.py:104, Failed to find a solution for growth of ' + org_id #111

Closed White-Shinobi closed 1 year ago

White-Shinobi commented 1 year ago

Dear Francisco,

I'm a PhD student from University of Groningen, focusing on host-gut microbiome interaction in HIV infection. Recently, I am trying to use MetaGEM to answer my study hypothesis 🐬. Thank you for building MetaGEM, which is a super brilliant work😊.

May I ask some questions about the logics inside SMETANA?

  1. For many samples, I would get the logs saying "Running SCS for community all on medium MEU8... Running MUS for community all on medium MEU8... /home/umcg-yzhang/.local/lib/python3.8/site-packages/smetana/smetana.py:104: UserWarning: SCS: Failed to find a solution for growth of X1218_G03_0462.100 warn('SCS: Failed to find a solution for growth of ' + org_id)" (X1218_G03_0462.100 is the name for my bacterial bin) In my understanding, this is saying my medium can not make the bacteria species grow. Is there any way to get the minimal medium to fix this problem?

Best, Yue

franciscozorrilla commented 1 year ago

Dear Yue,

Indeed this is suggesting that your species X1218_G03_0462.100 is unable to grow in that medium. Could this perhaps be a low quality/incomplete MAG? Pehaps you may need to perform additional/manual gapfilling for this model, or alternatively simply omit this model from your community analysis if possible.

In order to identify the minimal media of a given community try running the following command. In my case, I have a model called bacteria.xml and another called yeast.xml, so my minimal_debug.tsv file looks like this:

$ smetana --molweight -v -g -o minimal --debug path/to/models/*.xml

$ paste minimal_debug.tsv 
community   medium  key1    key2    data
all complete    mip ni  ala_B,ca2,cl,cobalt2,cu2,dha,fe2,fe3,ile__L,k,mg2,mn2,o2,orn,pi,so4,thm,tyr__L,val__L,zn2
all complete    mip i   ca2,cl,cobalt2,cu2,dha,fe2,fe3,k,mg2,mn2,o2,orn,pi,so4,thm,val__L,zn2
all complete    mro community   ca2,cl,cobalt2,cu2,fe2,fe3,glyc3p,k,mg2,mn2,o2,orn,so4,thm,val__L,zn2
all complete    mro bacteria    ca2,cl,cobalt2,cu2,fe2,glyc3p,k,mg2,mn2,nh4,o2,pnto__R,so4,thm,val__L,zn2
all complete    mro yeast   ca2,cl,cobalt2,cu2,dha,fe2,fe3,ile__L,k,mg2,mn2,o2,orn,pi,so4,thm,tyr__L,val__L,zn2

The data columm specifies a possible minimal media composition for each species in your community, as well as for the community. Note that these are not unique minimal media solutions, the --molweight flag additionally minimizes the molecular weight of the minimal media composition predicted, as this tends to produce more realistic media.

In the example above, if I were trying to find a minimal media for the bacteria I would take the media predicted above, but for example replace the carbon source glyc3p to reflect the carbon in your media, etc. Then you need to put this in the format of CarveMe/SMETANA media files, as shown here for example.

https://github.com/franciscozorrilla/metaGEM/blob/6285b93ea19c371da80acdffc83fa33981fab52f/scripts/media_db.tsv#L1-L20

Below is an example bash command that you can use in order to extract a media composition from the minimal_debug.tsv file. In this case, I chose to extract out the bacteria media into column form, which can be pasted into an excel sheet and completed to the spceifications shown above.

$ paste minimal_debug.tsv|grep bacteria|cut -f5|tr ',' '\n'
ca2
cl
cobalt2
cu2
fe2
glyc3p
k
mg2
mn2
nh4
o2
pnto__R
so4
thm
val__L
zn2

Best wishes, Francisco

White-Shinobi commented 1 year ago

Hi Francisco,

Thank you so much for the quick response😊. It is very useful. I found my medium doesn't have O2, which is required by a lot of species🌻.

Because this error comes from sc_score calculation, then I checked the SMETANA code for scores calculation 👉 smetana scores calculation https://github.com/cdanielmachado/smetana/blob/master/smetana/smetana.py. In the sc_score function, the solution for the metabolic model is calculated as: sol = solver.solve(objective, minimize=True, get_values=list(objective.keys())) (solver.solve https://github.com/cdanielmachado/reframed/blob/a26051a254baef521086c435db1ac2231e22d3c1/reframed/solvers/solver.py ) As the paper said, this sol is to "identify the minimal number of member species necessary to support the growth of the target species". This solve function is using the cplex function from the CPLEX module. In my situation, CPLEX didn't find the "optimal solution" for my objective. So it will show that "Failed to find a solution for growth of ' + org_id". Solution Status Codes by Number in the CPLEX https://www.ibm.com/docs/en/icos/20.1.0?topic=micclcarm-solution-status-codes-by-number-in-cplex-callable-library-c-api

But what I don't understand is how the medium data is used to calculate the sc_score? In the sc_score function, is the medium data loaded in? I do not understand how SMETANA uses the medium data, and I didn't find it in the paper. I don't know whether CPLEX did not find an optimal solution to the objective is because the medium content.

From Daniel's other codes, it seems the medium data is added to "Class Environment"?

Thank you again for your help!

Best, Yue

Francisco Zorrilla @.***> 于2022年11月14日周一 15:21写道:

Dear Yue,

Indeed this is suggesting that your species X1218_G03_0462.100 is unable to grow in that medium. Could this perhaps be a low quality/incomplete MAG? Pehaps you may need to perform additional/manual gapfilling for this model, or alternatively simply omit this model from your community analysis if possible.

In order to identify the minimal media of a given community try running the following command. In my case, I have a model called bacteria.xml and another called yeast.xml, so my minimal_debug.tsv file looks like this:

$ smetana --molweight -v -g -o mininmal --debug path/to/models/*.xml

$ paste minimal_debug.tsv community medium key1 key2 data all complete mip ni ala_B,ca2,cl,cobalt2,cu2,dha,fe2,fe3,ileL,k,mg2,mn2,o2,orn,pi,so4,thm,tyrL,valL,zn2 all complete mip i ca2,cl,cobalt2,cu2,dha,fe2,fe3,k,mg2,mn2,o2,orn,pi,so4,thm,valL,zn2 all complete mro community ca2,cl,cobalt2,cu2,fe2,fe3,glyc3p,k,mg2,mn2,o2,orn,so4,thm,valL,zn2 all complete mro bacteria ca2,cl,cobalt2,cu2,fe2,glyc3p,k,mg2,mn2,nh4,o2,pntoR,so4,thm,valL,zn2 all complete mro yeast ca2,cl,cobalt2,cu2,dha,fe2,fe3,ileL,k,mg2,mn2,o2,orn,pi,so4,thm,tyrL,valL,zn2

The data columm specifies a possible minimal media composition for each species in your community, as well as for the community. Note that these are not unique minimal media solutions, the --molweight flag additionally minimizes the molecular weight of the minimal media composition predicted, as this tends to produce more realistic media.

In the example above, if I were trying to find a minimal media for the bacteria I would take the media predicted above, but for example replace the carbon source glyc3p to reflect the carbon in your media, etc. Then you need to put this in the format of CarveMe/SMETANA media files, as shown here for example https://github.com/franciscozorrilla/metaGEM/blob/master/scripts/media_db.tsv .

https://github.com/franciscozorrilla/metaGEM/blob/6285b93ea19c371da80acdffc83fa33981fab52f/scripts/media_db.tsv#L1-L20

Below is an example bash command that you can use in order to extract a media composition from the minimal_debug.tsv file. In this case, I chose to extract out the bacteria media into column form, which can be pasted into an excel sheet and completed to the spceifications shown above.

$ paste minimal_debug.tsv|grep bacteria|cut -f5|tr ',' '\n' ca2 cl cobalt2 cu2 fe2 glyc3p k mg2 mn2 nh4 o2 pntoR so4 thm valL zn2

Best wishes, Francisco

— Reply to this email directly, view it on GitHub https://github.com/franciscozorrilla/metaGEM/issues/111#issuecomment-1313787233, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYL642XOOPUN3EFHYWU2TCDWIJDFZANCNFSM6AAAAAAR7WRS3Q . You are receiving this because you authored the thread.Message ID: @.***>

White-Shinobi commented 1 year ago

Hi Francisco,

Sorry, I should show more information:

Bin quality: For this X1218 sample, I got 30 bacterial bins. These bins are already after quality control and they are the ones was not filtered out. Their completeness higher than 50% and contamination lower than 5% ( I remember I set the threshold as your MetaGem paper.)

Specific errors: And in the log file, among all 30 bins, 2 bins show the error of "*SCS: Failed to find a solution for growth of binX; *warn('SCS: Failed to find a solution for growth of ' + org_id)"; however, all 30 bins show the error of "MUS: Failed to find a minimal growth medium for BinX; warn('MUS: Failed to find a minimal growth medium for ' + org_id)".

How I gapfilled and tested cross-feeding: In Carveme, I used M8+MEUdiet (~130 bigg metabolites totally) to gapfill; in SMETANA, I used M8 to get cross-feeding results.

The core question in the results: For many samples, the SMETANA result is empty, which just has the column names like below 👇. community medium receiver donor compound scs mus mps smetana I have 133 samples in total (all is human gut metagenomic data), but 43 samples show error like this and have empty results. I also checked your SMETANA results in MetaGem paper, for human microbiome data, I remember there is around 30 results are empty. 👉 https://zenodo.org/record/5593224/files/SMETANA.tar.gz?download=1 Why are these 30 samples are empty? Do they show the same error as mine? Do you think it's a good idea to get the minimal medium for these bins in these samples and re-run SMETANA for these samples?

On the other hand, I have already used MEMOTE to check the GEMs' quality and they are very comparable to the levels in your paper. So I think the problem is not from Carveme.

Thank you so much for your great pipeline. I really learned a lot 😊 from it.

Best, Yue

Francisco Zorrilla @.***> 于2022年11月14日周一 15:21写道:

Dear Yue,

Indeed this is suggesting that your species X1218_G03_0462.100 is unable to grow in that medium. Could this perhaps be a low quality/incomplete MAG? Pehaps you may need to perform additional/manual gapfilling for this model, or alternatively simply omit this model from your community analysis if possible.

In order to identify the minimal media of a given community try running the following command. In my case, I have a model called bacteria.xml and another called yeast.xml, so my minimal_debug.tsv file looks like this:

$ smetana --molweight -v -g -o mininmal --debug path/to/models/*.xml

$ paste minimal_debug.tsv community medium key1 key2 data all complete mip ni ala_B,ca2,cl,cobalt2,cu2,dha,fe2,fe3,ileL,k,mg2,mn2,o2,orn,pi,so4,thm,tyrL,valL,zn2 all complete mip i ca2,cl,cobalt2,cu2,dha,fe2,fe3,k,mg2,mn2,o2,orn,pi,so4,thm,valL,zn2 all complete mro community ca2,cl,cobalt2,cu2,fe2,fe3,glyc3p,k,mg2,mn2,o2,orn,so4,thm,valL,zn2 all complete mro bacteria ca2,cl,cobalt2,cu2,fe2,glyc3p,k,mg2,mn2,nh4,o2,pntoR,so4,thm,valL,zn2 all complete mro yeast ca2,cl,cobalt2,cu2,dha,fe2,fe3,ileL,k,mg2,mn2,o2,orn,pi,so4,thm,tyrL,valL,zn2

The data columm specifies a possible minimal media composition for each species in your community, as well as for the community. Note that these are not unique minimal media solutions, the --molweight flag additionally minimizes the molecular weight of the minimal media composition predicted, as this tends to produce more realistic media.

In the example above, if I were trying to find a minimal media for the bacteria I would take the media predicted above, but for example replace the carbon source glyc3p to reflect the carbon in your media, etc. Then you need to put this in the format of CarveMe/SMETANA media files, as shown here for example https://github.com/franciscozorrilla/metaGEM/blob/master/scripts/media_db.tsv .

https://github.com/franciscozorrilla/metaGEM/blob/6285b93ea19c371da80acdffc83fa33981fab52f/scripts/media_db.tsv#L1-L20

Below is an example bash command that you can use in order to extract a media composition from the minimal_debug.tsv file. In this case, I chose to extract out the bacteria media into column form, which can be pasted into an excel sheet and completed to the spceifications shown above.

$ paste minimal_debug.tsv|grep bacteria|cut -f5|tr ',' '\n' ca2 cl cobalt2 cu2 fe2 glyc3p k mg2 mn2 nh4 o2 pntoR so4 thm valL zn2

Best wishes, Francisco

— Reply to this email directly, view it on GitHub https://github.com/franciscozorrilla/metaGEM/issues/111#issuecomment-1313787233, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYL642XOOPUN3EFHYWU2TCDWIJDFZANCNFSM6AAAAAAR7WRS3Q . You are receiving this because you authored the thread.Message ID: @.***>

White-Shinobi commented 1 year ago

Hi Francisco,

When I tried the smetana --debug command, I also had the same issue with this question👇 MRO: Failed to find a valid solution

But this problem only appears when I ran all the 30 GEMs together, it didn't show error when I only run on xml file.

Best, Yue

franciscozorrilla commented 1 year ago

Hi Yue,

But what I don't understand is how the medium data is used to calculate the sc_score?

https://github.com/cdanielmachado/smetana/blob/d2b10434df6d13741614420b933b0870ea598222/smetana/smetana.py#L11-L27

The species coupling score is calculated under a given media composition, as you can see in the definition it contains the Environment parameter.

Specific errors: And in the log file, among all 30 bins, 2 bins show the error of "*SCS: Failed to find a solution for growth of binX; *warn('SCS: Failed to find a solution for growth of ' + org_id)"; however, all 30 bins show the error of "MUS: Failed to find a minimal growth medium for BinX; warn('MUS: Failed to find a minimal growth medium for ' + org_id)".

Have you seen this post on the CarveMe issues section? Perhaps try adding the --flavor bigg parameter to your SMETANA commands to see if it resolves the issue.

Why are these 30 samples are empty? Do they show the same error as mine? Do you think it's a good idea to get the minimal medium for these bins in these samples and re-run SMETANA for these samples?

Indeed we had some empty SMETANA results, as I discussed in this comment. Some of the empty samples were due to prohibitively long runtimes due to large community size, while others were due to small communities not predicted to require any metabolite exchanges. Yes, perhaps try running in a more complete or different media

When I tried the smetana --debug command, I also had the same issue with this question👇 https://github.com/cdanielmachado/smetana/issues/28

I suggest you add a comment to that issue and ask iulia if she ever figured it out, perhaps also describing a bit your specific case.

But this problem only appears when I ran all the 30 GEMs together, it didn't show error when I only run on xml file.

Interesting ... thanks for sharing this. Do you know what the taxonomy of your two bins that are failing? Are they perhaps super fragmented genomes? Can you check the number of reactions and metaboites in those models?

Best, Francisco