geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Extract BP terms via MetaCyc identifiers for SGD pathways #159

Closed dustine32 closed 1 year ago

dustine32 commented 2 years ago

For emitting the correct GO BP term for each pathway, we should grab the MetaCyc ID from the BioPAX (might be something like BioCyc:... then xref this to a GO term in the GO ontology.

Tagging @cmungall

dustine32 commented 2 years ago

The SGD pathway BioPAX file basename (and eventual model_id) is the MetaCyc ID that can be looked up in the GO ontology to find BP term(s). Example for SO4ASSIM-PWY.owl:

    <!-- http://purl.obolibrary.org/obo/GO_0019379 -->

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/GO_0019379">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/GO_0000103"/>
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/GO_0019419"/>
        <obo1:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">The pathway by which inorganic sulfate is processed and incorporated into sulfated compounds, where the phosphoadenylyl sulfate reduction step is catalyzed by the enzyme phosphoadenylyl-sulfate reductase (thioredoxin) (EC:1.8.4.8).</obo1:IAO_0000115>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">MetaCyc:SO4ASSIM-PWY</oboInOwl:hasDbXref>

So, GO:0019379 "sulfate assimilation, phosphoadenylyl sulfate reduction by phosphoadenylyl-sulfate reductase (thioredoxin)" should be the pathway BP.

Will try retrieving this for all pathways and report back which ones either have no mapping or more than one.

dustine32 commented 2 years ago

Results!:

yeast_pathways_no_gos.txt

cmungall commented 2 years ago

oh ok... it seems kinda hacky to get it from the filename but I guess if they don't actually put it in the biopax that's what we've got to do!

Looking at the unmapped

Some seem to be specific to yeastpathways and not in metacyc

YEAST-GALACT-METAB-PWY

=>

https://pathway.yeastgenome.org/YEAST/NEW-IMAGE?type=PATHWAY&object=YEAST-GALACT-METAB-PWY

at the top we have "galactose degradation"

if you can give me all these labels I can map them to GO

e.g. this one to:

id: GO:0019388 name: galactose catabolic process namespace: biological_process def: "The chemical reactions and pathways resulting in the breakdown of galactose, the aldohexose galacto-hexose." [ISBN:0198506732] synonym: "galactose breakdown" EXACT [] synonym: "galactose catabolism" EXACT [] synonym: "galactose degradation" EXACT [] xref: MetaCyc:GALDEG-PWY intersection_of: GO:0009056 ! catabolic process intersection_of: has_primary_input CHEBI:28260 ! galactose

Now weirdly that GO term is already mapped to something that metacyc is resticted to proteobacteria https://metacyc.org/META/new-image?type=PATHWAY&object=GALDEG-PWY

I think that GO term should be mapped to the more generic:

https://metacyc.org/META/NEW-IMAGE?type=ECOCYC-CLASS&object=GALACTOSE-DEGRADATION

I think the metacyc xrefs in GO are a bit suspect so I'd like check your positive list to

dustine32 commented 2 years ago

@cmungall Attached is the full list TSV of pathways with columns:

  1. Pathway ID
  2. Pathway name/label
  3. GO term mapped via pathway filename/ID

yeast_pathway_id_labels_gos.txt So this is the combined positive (if GO in col3) and negative list.

dustine32 commented 2 years ago

Tagging @thomaspd

dustine32 commented 1 year ago

Closing this for now since I just merged the code to pull these mappings from the upstream GO. Mappings can be added/improved in the GO itself.