Closed cheng-yu-zhang closed 1 year ago
@cheng-yu-zhang could you please explain a bit what was exactly done in this PR (and the other two that you opened)? Like where did you get the information from, why did you make these changes, perhaps any special cases or considerations? What solved the growth problem that you encountered?
@edkerk that's my fault, i wiil detail more information.
I have reorganized the data, to fit #302.
DBnewRxnsMets.tsv
, DBnewRxnsRxns.tsv
and now also DBnewRxnsGenes.tsv
But overall, I'm not convinced whether all these reactions should be included. What criteria were used to include them? What experimental evidence is there to support them? [To facilitate this, I changed the layout of the yeast-GEM.txt
file (cb966bc, using exportForGit
), which makes for easier diff-ing in 25b724b.]
Some examples:
rxnID | reaction equation | grRule |
---|---|---|
r_4855 | oxygen[c] + Melatonin[c] => Formyl-N-acetyl-5-methoxykynurenamine[c] | YJR078W |
r_4810 | oxygen[c] + Serotonin[c] => Formyl-5-hydroxykynurenamine[c] | YJR078W |
These are probably not correct. The breakdown of melatonin and serotonin, which are not yeast metabolites, has the same EC number as the reaction from tryptophan to N-formyl-kynurenine, which is a reaction in NAD biosynthesis. Actually, there are four reactions in this map with the same EC number, but only one of these is part of a functional pathway.
There are more examples like this, also based on MetaCyc. So how were these reactions selected?
Then there are also other problematic reactions. The following two reactions are modifying proteins, which is outside the scope of a metabolic network. Moreover, they are actually half-reactions of pyruvate dehydrogenase and alpha-ketoglutarate dehydrogenase (both already in the model, and associated with the same genes). So no need to include these:
rxnID | reaction equation | grRule |
---|---|---|
r_4833 | coenzyme A[m] + pyruvate-dehydrogenase-acetylDHlipoyl[m] => acetyl-CoA[m] + pyruvate-dehydrogenase-dihydrolipoate[m] | YNL071W |
r_4834 | succinyl-CoA[m] + N6-dihydrolipoyl-L-lysine[m] <=> coenzyme A[m] + N6-S-succinyldihydrolipoyl-L-lysine[m] | YDR148C |
There are other reactions that act on non-specific substrates:
rxnID | reaction equation | grRule |
---|---|---|
r_4755 | 2 H+[c] + H2O[c] + L-Selenocystathionine[c] => ammonium[c] + pyruvate[c] + Selenohomocysteine[c] | YGL184C or YHR112C or YFR055W |
r_4835 | H2O[c] + S-Substituted-L-Cysteines[c] => ammonium[c] + pyruvate[c] + Thiols[c] | YGL184C or YFR055W |
There has been some discussion about including non-specific substrates (#219), but these genes are already associated to existing reactions (r_0308
), so there is no value of including it as non-specific reactions.
There are also examples of fluorinated and chlorinated compounds that would not occur in S. cerevisiae.
Overall: The list of new reactions should be carefully cureated, to make sure that the models that are added make sense. More reactions is not perse better, even if it would not directly affect some of the model metrics (predicted growth rate, gene essentiality etc.).
@edkerk Is there any issue about the new reactions that I need to fix?
I have refactored the script and location of datafiles to match the generic curation format introduced in #313. See code/modelCuration/v8_6_1.m
for how the model curation is performed.
I reiterate the last sentence of the previous comment: The list of new reactions should be carefully curated, to make sure that the models that are added make sense. More reactions is not perse better, even if it would not directly affect some of the model metrics (predicted growth rate, gene essentiality etc.).
So you should go through the list of reactions 1-by-1 and manually check whether they make sense. You uploaded draft models from KEGG and MetaCyc, but there is no explanation given which reactions are then included and why. I quickly looked through the new reactions, and found some more issues:
rxnID | reaction equation | grRule |
---|---|---|
r_0916 | ATP[c] + ribose-5-phosphate[c] => AMP[c] + H+[c] + PRPP[c] | (YKL181W and YER099C) or (YKL181W and YHL011C) or (YKL181W and YBL068W) or (YER099C and YOL061W) or (YBL068W and YOL061W) |
r_4723 | ATP[c] + D-ribose 5-phosphate[c] <=> AMP[c] + H+[c] + 5-Phospho-alpha-D-ribose 1-diphosphate[c] | YBL068W or YHL011C or YER099C or YOL061W or YKL181W |
The first reaction was already present, while the second reaction has different metabolite names, it represents the same reaction. This also highlights that there are duplicate metabolites, which otherwise would have made it easier to spot.
See above, even if the reaction would not have been duplicate, then ribose-5-phosphate
and ´D-ribose 5-phosphate` are highly likely the same metabolite, so make sure there is only one of them present.
rxnID | reaction equation | grRule |
---|---|---|
r_0481 | glutathione disulfide[c] + H+[c] + NADPH[c] => 2 glutathione[c] + NADP(+)[c] | (YCL035C and YPL091W) or (YDR098C and YPL091W) or (YDR513W and YPL091W) or (YER174C and YPL091W) |
r_4711 | 2 glutathione[c] + NAD[c] <=> glutathione disulfide[c] + H+[c] + NADH[c] | YPL091W |
The first reaction is how glutathione oxidoreductase is widely accepted to function. The new reaction is reversible, uses NADH and has a much simplified gene association. What strong evidence is there to include the second one?
See both examples above, the new reactions have much simplified gene associations, while the old reactions indicate complexes with subunits. What strong evidence is there to have the simplified gene association?
But it's worthwhile to have another look at the previous comment as well, as these issues are not fully resolved. How is the localization determined? Be very careful with reactions predicted by MetaCyc, it can quickly draw in non-native substrates.
@edkerk Hi, Ed. I encounter a problem. When I fail to run deletion = cobra.flux_analysis.deletion.double_gene_deletion(model, gene_list1=pair1, gene_list2=pair2)
in python using yeast-GEM from both main branch and develop branch. Even if I change the version of cobrapy, I can not solve it. So, I am wondering if saveYeastModel.m has changed.
The error is below:
Traceback (most recent call last): File "D:\Anaconda\envs\python38\lib\site-packages\IPython\core\interactiveshell.py", line 3444, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "
", line 47, in deletion = double_gene_deletion(model, File "D:\Anaconda\envs\python38\lib\site-packages\cobra\flux_analysis\deletion.py", line 393, in double_gene_deletion return _multi_deletion( File "D:\Anaconda\envs\python38\lib\site-packages\cobra\flux_analysis\deletion.py", line 144, in _multi_deletion with ProcessPool( File "D:\Anaconda\envs\python38\lib\site-packages\cobra\util\process_pool.py", line 56, in init pickle.dump((initializer,) + initargs, handle) TypeError: cannot pickle 'SwigPyObject' object
@edkerk @hongzhonglu Are there any methods to solve the above problem?
Hmm, even if saveYeastModel
is changed, it would still produce a valid SBML file that cobrapy should be able to import without issues. Just to confirm that it is really a problem with the model itself, have you tried running it on another model (non yeast-GEM, maybe E. coli?).
@edkerk @hongzhonglu double_gene_deletion
and single_gene_deletion
could be perfectly performed in iML1515 and yeast-GEM 8.5. But in the latest yeast-GEM, somthing goes wrong.
However, matlab can run double_gene_deletion
with a solvable problem. And I am working on it.
I went through all suggested reactions, checked them one-by-one. With the quality of the current yeast-GEM, one should be careful to include new reactions, there should be more evidence than it appearing in KEGG. I checked with the following strategy:
I went through all suggested reactions, checked them one-by-one. With the quality of the current yeast-GEM, one should be careful to include new reactions, there should be more evidence than it appearing in KEGG. I checked with the following strategy:
- Check if the reaction is not a partial reaction, which is already represented in the model as the complete reacton.
- Compare the new reaction with existing reactions annotated to the same gene: if there is a difference (in e.g. substrate or co-factor), find evidence in literature if the new reaction is supported and/or likely to be present. Not only guided by KEGG or UniProt, but search for more solid evidence.
- If the above are true, then see if the reactants and/or products connect to existing metabolites. If so, then include the reaction in that compartment, but do not add it to other compartments. This should rather be addressed by a thorough curation of all reaction compartmentalizations. If the reaction does not connect to the existing metabolic network, then just add it to whatever compartment is suggested.
I agree with the detailed strategy. With a standard workflow, we can add new reactions more efficiently and credibly.
Main improvements in this PR:
Try to be as clear as possible: Is it fixing/adding something in the model? Is it an additional test/function/dataset? PLEASE DELETE THIS LINE.
Saccharomyces_cerevisiae_draftmodel_kegg
andSaccharomyces_cerevisiae_draftmodel_metacyc
I hereby confirm that I have:
develop
as a target branch (top left drop-down menu)