geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Imported pathways missing protein/activity unit, inputs, chemicals #269

Open suzialeksander opened 11 months ago

suzialeksander commented 11 months ago

Looking at http://noctua-dev.berkeleybop.org/workbench/noctua-visual-pathway-editor/?model_id=gomodel%3AYeastPathways_PWY3O-402 and even http://noctua-dev.berkeleybop.org/workbench/annpreview/?model_id=gomodel:YeastPathways_PWY3O-402, we don't see KCS1 at all. It looks like the entire step for D-myo-inositol 1,3,4,5,6-pentakisphosphate was omitted, even though myo-inositol 1,3,4,5,6-pentakisphosphate appears as an output for ARG82.

image (2)

Full pathway on SGD at https://pathway.yeastgenome.org/YEAST/new-image?type=PATHWAY&object=PWY3O-402&detail-level=2

If this is just a one-off or affects a small number of pathways, we are happy to edit once it's in Noctua.

suzialeksander commented 11 months ago

In the 'superpathway of methionine salvage pathway - imported from: Saccharomyces Genome Database': http://noctua-dev.berkeleybop.org/editor/graph/gomodel:YeastPathways_PWY3O-351 there are two gene products missing. MRI1 is not included as an 'enabled by' entry for [GO:0046523] S-methyl-5-thioribose-1-phosphate isomerase activity, and MDE1 is not included as an 'enabled by' entry for [GO:0046570] methylthioribulose 1-phosphate dehydratase activity

suzialeksander commented 11 months ago

inositol phosphate biosynthesis in Noctua is missing inputs and chemicals.

http://noctua-dev.berkeleybop.org/workbench/noctua-visual-pathway-editor/?model_id=gomodel%3AYeastPathways_PWY3O-402

v.

https://pathway.yeastgenome.org/YEAST/NEW-IMAGE?type=PATHWAY&object=PWY3O-402&detail-level=2

dustine32 commented 11 months ago

@suzialeksander Ok, looks like the enzyme/enabler issue is something I can fix by just adding some missing xrefs to the hacky lookup sgd_MONOMER3O_uniprot_lkp.tsv table. This was the case for KCS1 in PWY3O-402 and MRI1, MDE1 in PWY3O-351.

The other entity classes like small molecules/compounds will require some curator expertise. The xref data in the BioPAX doesn't have CHEBI IDs for many (134 total) of these and that's what's causing the fallback to "chemical entity" like in the above case of output "a diphospho-1D-myo-inositol tetrakisphosphate" in PWY3O-402.

To fix, we'll need to make another lookup table of some sort to match each chemical's current xref value to its correct ChEBI ID. For our "a diphospho-1D-myo-inositol tetrakisphosphate" example, the BioPAX has xref YeastCyc:DIPHOSPHO-1D-MYO-INOSITOL-TETRAKISPHOSPH and we can manually search the YeastPathways site with "DIPHOSPHO-1D-MYO-INOSITOL-TETRAKISPHOSPH" to get the page for this chemical. Unfortunately, this page only has a PubChem xref. I'm not aware of how to get the ChEBI ID from this?

Would someone be available to go through this list (on the GO drive in the YeastPathways folder) and fill in the ChEBI ID column?

deustp01 commented 11 months ago

Would someone be available to go through this list (on the GO drive in the YeastPathways folder) and fill in the ChEBI ID column?

It would help the unfortunate person stuck with this job if any cross-references to other databases like PubChem were extracted from MetaCyc and attached to the items on the list. MetaCyc is missing a lot of ChEBI cross-references, I think for historical reasons (annotations were done before ChEBI existed and updating has been hard to support), but these others could be used as hints.

Oddly, one item on the list that I looked up at random, acetoin, has a MetaCyc entry that includes as one of its attributes a "unification link" to ChEBI. Are existing cross-references not getting into the MetaCyc BioPax file, or not being found there?

deustp01 commented 11 months ago

this page only has a PubChem xref. I'm not aware of how to get the ChEBI ID from this?

The PubChem page for the compound often includes a cross-reference to the ChEBI page for the compound. I do the look-ups by hand; but a script could do this better / faster / less painfully?