geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Move source link to comment field; Set contributor to some value in users.yaml #126

Closed dustine32 closed 11 months ago

dustine32 commented 3 years ago

Currently in YeastPathways models, the contributor field is set to the source URL that link's back to the pathway at SGD (e.g. https://pathway.yeastgenome.org/YEAST/NEW-IMAGE?object=YEAST-SALV-PYRMID-DNTP) while the source field is set to a CURIE in the format "SGD:[pathway_id]" (e.g. "SGD:YEAST-SALV-PYRMID-DNTP").

We should move this source URL value to the source field: image

And then set contributor to a value that is in the go-site users.yaml file, "GOC:sgd_curators". Aligning contributor with users.yaml will allow full functionality of Noctua search and filtering.

For Reactome, we can also move the source URL (pointing to the pathway at Reactome) to the source field. But we will need to find or add some users.yaml entry to set in contributor. Should we copy SGD and make a "GOC:reactome_curators" users.yaml entry?

Originating ticket: https://github.com/geneontology/go-site/issues/1615

dustine32 commented 2 years ago

Referencing https://github.com/geneontology/pathways2GO/issues/37 as it contains some history on how the source field was populated. I'm now realizing this becomes the DB:Reference (aka PMID of evidenced literature) column in GPAD and so should follow the GPAD spec. To conform to the spec, this source needs to be a CURIE.

Note that source currently "works" for Reactome with an example CURIE Reactome:R-HSA-112303 correctly expanding and forwarding to its Reactome pathway page. However, does this "work" for the GPAD reference column? While syntactically correct, is this value meaningful as a literature reference?

For YeastPathways, conveniently, identifiers.org already has sgd.pathways registered so we can just use this. But do we need to add this to a context JSONLD such as go_context.jsonld? Like:

        "sgd.pathways": "https://identifiers.org/sgd.pathways/", 

@cmungall @kltm What you you think?

dustine32 commented 1 year ago

As discussed on the 2022-09-14 Reactome weeds call with @deustp01 @ukemi @vanaukenk @kltm, Reactome is OK with the contributor field using GOC:reactome_curators value. Additionally, the full URL (e.g., https://reactome.org/content/detail/R-HSA-71291) with be moved to a comment field. The source field pattern of Reactome: + {pathway ID} (e.g., Reactome:R-HSA-71291) will remain unchanged.

When exporting these GO-CAMs to a standard annotation format file (GAF, GPAD), these fields map to: GAF 2.2: contributor -> N/A source -> reference (column 6) comment -> N/A

GPAD 2.0: contributor -> annotation properties (column 12) contributor={value} Ex: contributor=GOC:reactome_curators source -> reference (column 5) comment -> annotation properties (column 12) comment={value} Ex: comment=https://reactome.org/content/detail/R-HSA-71291

kltm commented 1 year ago

@dustine32 Great--thank you! Looking at the format docs, I think that we're safe here and that linking to Reactome pathway for source is likely the correct thing.

dustine32 commented 1 year ago

After fixing #242, the only remaining aspect of this ticket appears to be to add the URL of the source pathway to a comment field.

Examples:

suzialeksander commented 1 year ago

@dustine32 I'm looking at dev and these issues seem to be resolved-including the YeastPathways are popping up when searching by group, yay.

The opening ticket seemed to indicate the source field ultimately would be a URL, is that still the plan? If not this ticket may be closable. The URL is currently in the comment field.

Screenshot 2023-05-11 at 16 51 49

dustine32 commented 1 year ago

@suzialeksander Apologies, this was a messy ticket. (and thank you for testing!)

Right, we originally intended to put that URL in source but then realized this source value is exported to a GPAD column requiring a CURIE format. So instead, we decided on the 2022-09-14 Reactome weeds call to move the URL to a comment field though this was for Reactome models. Is the "URL in comment" solution also OK for YeastPathways models?

suzialeksander commented 1 year ago

URL in comment is great. Should I expect this comment to be visible in the GPAD output?

suzialeksander commented 1 year ago

Decision: we don't necessarily need a working URL from the comments as GPAD output if we get the DB ref working (https://github.com/geneontology/pathways2GO/issues/254). The URL can remain as a historical note, try to make it un-bot-able/resistant to being treated like anything more than a comment.

dustine32 commented 1 year ago

@suzialeksander Thanks!

To make it more comment-like, I'll prepend "Imported from YeastPathways: " to the existing URL in the comment. That sound OK?

suzialeksander commented 1 year ago

sounds good to me, @kltm would that solve your concerns about the comment field?

kltm commented 1 year ago

@suzialeksander Good enough for me.

dustine32 commented 1 year ago

Update after discussing with @suzialeksander: Changing "Imported from YeastPathways: " to "Imported from Saccharomyces Genome Database: " to reuse a datasource variable that's already used to populate the model title: https://github.com/geneontology/pathways2GO/blob/700b04755f997088877e11b68c65471846d5ed27/exchange/src/main/java/org/geneontology/gocam/exchange/BioPaxtoGO.java#L182-L184 image

I was mainly worried about cluttering the code further with alternate "Imported from X" logic.