SysBioChalmers / Sco-GEM

The consensus GEM for Streptomyces coelicolor -
https://sysbiochalmers.github.io/Sco-GEM/
Creative Commons Attribution 4.0 International
3 stars 7 forks source link

fix: clean-up annotations #105

Closed edkerk closed 3 years ago

edkerk commented 4 years ago

Description of the issue:

Some overlap with #33 and #44, but bigger scope here. If this issue is settled, then the other two can also be closed.

Various (minor) issues exist with current annotations of genes, mets and rxns:

Expected feature/value/output:

To do:

If agreed to remove the sporious annotations to uncommon databases, to do list will be included here.

I hereby confirm that:

edkerk commented 4 years ago

Additional realisation: what is currently in the BioCyc field are actually MetaCyc identifiers. E.g. memote (see example report here) checks for BioCyc identifiers. To turn MetaCyc identifiers into BioCyc identifiers they have to be prefixed with META::

These identifiers are then also following specifications on Identifiers.org.

sulheim commented 4 years ago
edkerk commented 3 years ago

The GO term and PFAM annotations to the genes will first be discarded (PR #117), as these are assigned inconsistently (only about 200 of 1778 genes have these annotations). Meanwhile #44 remains open, where additional gene annotations are proposed.

sulheim commented 3 years ago

Actually, we shouldn't discard the gene annotations. I haven't checked the 200 genes with annotations, but most genes are annotated to GO/PFAM/ETC in iAA1259. These annotations are valuable and should not be discarded.

sulheim commented 3 years ago

These annotations are already in this repository under data/sulheim2020/annotations/genes.csv. We can add GO and PFAM annotations from this spreadsheet into the model file.

edkerk commented 3 years ago

Ah, we already have that data, great. This will then be done in #44 and not addressed here.