Sco-GEM 1.3.0 - Githubissues

edkerk commented 3 years ago

Main improvements in this PR:

fix:
- correct biocyc annotation ZCAROTDH2 (closes #33)
- UniProt ID, PFAM, Panther, GO term and refseq annotations as gathered from Uniprot (closes #44)
- correct invalid KEGG metabolite IDs identified in Sco4 paper (closes #62)
- gene names are gathered from the Sanger genome annotation, strepDB, UniProt and additional literature (closes #64)
- directionality of ICDHyr (closes #87)
- remove reaction with incorrect co-factor (G6PDH1b) (closes #88)
- remove unnecessary anomers of carbohydrates, only keeping the generic forms (closes #89)
- directionality of ferredoxin-NAPD reductase (closes #90)
- remove incorrect in GPR for CAT reaction (closes #100)
- curated duplicated reactions, involved GABTA, ABPYRATA, CA2abc1 and AMEt (closes #104)
- correct and deprecate multiple reaction and metabolite annotations (closes #105)
- remove reaction GND2, wrong co-factor (closes #106)
- remove multiple annotations when only one is correct (KEGG, ec-code) (closes #111)
- acetyl-CoA rewiring from iKS1317 (closes #127)
- remove unused genes, not annotated to any reaction
- remove inconsistent and unnecessary <notes> entries, with information already covered elsewhere
- export.py recognizes sbo terms, adheres to standard-GEM, include subroutine to load previous releases of the model (export.get_earlier_model_unversioned)
- remove two unnecessary boundary metabolites (closes #136)
feat:
- protein sequence, length and mass are gathered from UniProt (closes #44)
- increaseVersion.py function that should be run on master branch to prepare a new version release (closes #133)
- template curation script code/curation/vx_x_x.py
refactor:
- renamed folders and reformatted README.md to adhere to standard-GEM
- all model files (xml and yml for now) are located in the model/ folder (closes #110)
- moved scripts, data and ec-models specific for Sulheim et al. (2020) to dedicated folders in /code and /data (closes #110)
- separate requirements.txt for code/sulheim2020/, reducing the packages in /requirements.txt
- I/O by latest cobrapy (0.20) adds zero charge for metabolites whose charge was previously not specified (hence, metabolite charges should be curated, see #79)
doc:
- add Zenodo batch (closes #15, closes #65)
- point to GitHub Discussions instead of Gitter (closes #124)
- update contributing guidelines, move to .github folder (closes #133)
chore:
- add model.id field (closes #128)

I hereby confirm that I have:

[x] Tested my code with all requirements for running the model
[x] Selected devel as a target branch (top left drop-down menu)

sulheim commented 3 years ago

[ ] Rerun the memote tests (including the custom growth / knockout phenotypes test) and assure that the accuracy is the same
[ ] Update memote report

sulheim commented 3 years ago

One concern with the current implementation of the curation script is that it is inconvenient to test the v_1_3_0.py since it needs the model version 1.2.0 to run. And currently, the model version in the repository is not 1.2.0. Not sure how we should solve this. We could of course save intermediate model versions in an archive that can be used as input? What do you think?

edkerk commented 3 years ago

True. Another strategy would be to have a v1_3_0.py script that gathers the various curations, while having the individual scripts (each referring to one Issue?) in a subfolder. Then, v1_3_0.py could be coded to check that the model "input" is the previous model version (or could even specifically load that release using git), before running all individual scripts and finally generating all output.

sulheim commented 3 years ago

I think it would be too many files if we strictly have one file per issue. Hower, on a higher level your suggestion might work. E.g. A general reconstruction scipt that checks model version before it calls v_1_3_0.py. Actually, I think it would be nice to bring this discussion up in the Standard-GEM development, as I am sure this is not a concern that is limited to the Sco-GEM development. What do you think? I guess this is partially overlapping with https://github.com/MetabolicAtlas/standard-GEM/issues/20

edkerk commented 3 years ago

It is indeed a universal issue, discussion in standard-GEM would make sense.

One script per issue might indeed be too much, perhaps multiple smaller issues could be combined in a script e.g. v1_3_0/misc_curations.py script, while larger issues could have their own dedicated script e.g. v1_3_0/feat_reassign_all_metabolite_annotations.py.

One issue I can see is that the model version number will not be known until PR to master, so instead of v1_3_0/ this folder could be named latest/, and renamed to the desired version number in devel before making PR to master. However, if these paths are also hard-coded in any scripts, then that would also have to be changed.

Easiest way is then not to name scripts by their version number or collect them in folders named with version number. Instead, the functions and data are in appropriate subfolders of code/ and data/ (for instance ../biomass/ or ../reversibility/, while still having a collecting script v1_3_0.py that checks model version before calling the appropriate functions.

sulheim commented 3 years ago

I still think that it might make sense to have a model/archive folder. A pull reuqest from devel to master that cause a minor bup in version could archive the xml-version of that model. Thus, in the model folder you can always find the latest model version, but if you need the previous version for testing you can easily do that.

edkerk commented 3 years ago

[x] Rerun the memote tests (including the custom growth / knockout phenotypes test) and assure that the accuracy is the same

[x] Update memote report

Custom tests:
- Knockouts from transposon data: one less TP, as the unused gene SCO3946 was present in the previous model version and thereby a "false true positive". This gene is now not in the model and therefore not tested for essentiality.
- Knockouts from literature data: one more TP, but also one additional FN: SCO2726 at different carbon sources. Accuracy therefore unchanged.
Memote (run locally):
- Stoichiometric Consistency test has changed (Memote version 0.12.0 vs. 0.9.12). Overall score dropped from 77% to 72%.
- When running Memote 0.12.0 on the previous release, overall score increased from 71% to 72%.

Will include code that runs Memote locally with each release.

SysBioChalmers / Sco-GEM

Sco-GEM 1.3.0 #122

Main improvements in this PR: