SysBioChalmers / yeast-GEM

The consensus GEM for Saccharomyces cerevisiae
http://sysbiochalmers.github.io/yeast-GEM/
Creative Commons Attribution 4.0 International
94 stars 43 forks source link

feat: confidence scores for reactions #36

Closed hongzhonglu closed 6 years ago

hongzhonglu commented 6 years ago

Based on the confidence scores of reactions, we can add scores for reactions yeast model.

Confidence Score Examples
Biochemical data 4 Direct evidence for gene product function and biochemical reaction: Protein purification, biochemical assays, experimentally solved protein structures, and comparative gene-expression studies.
Genetic data 3 Direct and indirect evidence for gene function: Knock–out characterization, knock-in characterization, and over-expression.
Physiological data 2 Indirect evidence for biochemical reactions based on physiological data: secretion products or defined medium components serve as evidence for transport and metabolic reactions.
Sequence data 2 Evidence for gene function: Genome annotation, SEED annotation32.
Modeling data 1 No evidence is available but reaction is required for modeling. The included function is a hypothesis and needs experimental verification. The reaction mechanism may be different from the included reaction(s).
Not evaluated 0 -  

Based on the confidence scores, it can be more clear for us to improve the model quality consistently, how do you think of it ? @all.

edkerk commented 6 years ago

Excellent. RAVEN 2.0 also supports confidenceScores as a field.

BenjaSanchez commented 6 years ago

@hongzhonglu I like the idea! How did you come up with the score system? Something that is not clear to me is if the confidence scores are cumulative, e.g. what happens if we have genetic data but no physiological data?

simas232 commented 6 years ago

@hongzhonglu, to my knowledge COBRA 3 developers now re-scaled confidence score system from 0 (worst) to 5 (the highest). So I suggest that we incorporate confidence scores compliant with COBRA 3 as well. I got such information from [COBRA directory]/docs/source/notes/COBRA_structure_fields.xlsx.

hongzhonglu commented 6 years ago

@simas232 Good! @BenjaSanchez The original scores system is from "A protocol for generating a high-quality genome-scale metabolic reconstruction". For “ what happens if we have genetic data but no physiological data?”, The genetic data can be more valuable, so it can be 3 even there are no physiological data ?

hongzhonglu commented 6 years ago

@simas232 Do you find how to define 0-5 for reactions? I only find the definitions in the above table.

simas232 commented 6 years ago

@hongzhonglu, no I didn't find the definitions for the new grade scale. The COBRA 3.0 manuscript is available online, but they didn't include the table with descriptions.

hongzhonglu commented 6 years ago

@simas232, Thanks! I also checked the scores in latest E.coli model. Also 4 is highest scores for all reactions. So let us just use the score system in above table.

BenjaSanchez commented 6 years ago

the confidence scores have been added in commit f787058 using an automatic function. Some manual curation remains, but for now this issue will be closed