Closed edkerk closed 3 years ago
Additional realisation: what is currently in the BioCyc field are actually MetaCyc identifiers. E.g. memote (see example report here) checks for BioCyc identifiers. To turn MetaCyc identifiers into BioCyc identifiers they have to be prefixed with META:
:
META:F16ALDOLASE-RXN
instead of F16ALDOLASE-RXN
These identifiers are then also following specifications on Identifiers.org.
META:
prefix.The GO term and PFAM annotations to the genes will first be discarded (PR #117), as these are assigned inconsistently (only about 200 of 1778 genes have these annotations). Meanwhile #44 remains open, where additional gene annotations are proposed.
Actually, we shouldn't discard the gene annotations. I haven't checked the 200 genes with annotations, but most genes are annotated to GO/PFAM/ETC in iAA1259. These annotations are valuable and should not be discarded.
These annotations are already in this repository under data/sulheim2020/annotations/genes.csv. We can add GO and PFAM annotations from this spreadsheet into the model file.
Ah, we already have that data, great. This will then be done in #44 and not addressed here.
Description of the issue:
Some overlap with #33 and #44, but bigger scope here. If this issue is settled, then the other two can also be closed.
Various (minor) issues exist with current annotations of genes, mets and rxns:
[x] 1.
metanetx
is sometimes misspelled asmetabetx
[x] 2. About 21 metabolites are annotated to
3dmet
. Either annotate all metabolites (as far as possible) to this database, or drop the annotation of these few metabolites. This is not a database commonly used for GEMs, the model is not using 3D structures of metabolites, so drop the annotation.[x] 3. About 19 metabolites are annotated to
cas
. Either annotate all metabolites (as far as possible) to this database, or drop the annotation of these few metabolites. This is not a database commonly used for GEMs, so drop the annotation.[x] 4. About 27 metabolites are annotated to
pubchem.substance
. The more common database ispubchem.compound
. Drop thepubchem.substance
annotation and make sure that thepubchem.compound
is included for those metabolites.[x] 5. Only N-Acetyl-L-Cysteine is annotated to
kegg.drugs
, whilekegg.compound
is also defined. Drop thekegg.drugs
annotation.[x] 6. Some ChEBI annotations are misrepresented, as
CHEBI%3A15377
instead ofCHEBI:15377
. Each of the metabolites is also annotated with the correct ChEBI annotation: remove the incorrect ones.[x] 7. About 17 reactions are annotated to
kegg.ontology
. This is mostly redundant to the more commonly usedkegg.reactions
, which are annotated to 1369 reactions (including the 17 reactiosn withkegg.ontology
). Drop the annotation.[ ]
8. Of the 1778 genes, 215 are annotated with GO terms. Should this be extended to all genes, or shouldSee #44go
as annotation be dropped?[ ]
9. Of the 1778 genes, 219 are annotated with PFAM domains. Should this be extended to all genes, or shouldSee #44pfam
as annotation be dropped?Expected feature/value/output:
To do:
If agreed to remove the sporious annotations to uncommon databases, to do list will be included here.
I hereby confirm that:
master
branch of the repository