geneontology / noctua

Graph-based modeling environment for biology, including prototype editor and services
http://noctua.geneontology.org/
BSD 3-Clause "New" or "Revised" License
36 stars 13 forks source link

Update MF-MF relations in existing models #813

Closed vanaukenk closed 1 year ago

vanaukenk commented 1 year ago

This is a stub ticket for starting to spell out the steps we'll need to take to update MF-MF relations in existing models as a result of the work on the MF Relations Project.

Generally, the work that needs to be done is:

balhoff commented 1 year ago

@vanaukenk here's some counts for existing data. The different rows may overlap (e.g., "molecular_function" counts all instances of "molecular_function" subjects with the given predicate). The object (not in the table) is always some instance of "molecular_function".

?subject_type   ?subject_type_label ?p  ?p_label    ?triples
<http://purl.obolibrary.org/obo/GO_0003674> "molecular_function"    <http://purl.obolibrary.org/obo/RO_0002213> "positively regulates"  "1515"
<http://purl.obolibrary.org/obo/GO_0003674> "molecular_function"    <http://purl.obolibrary.org/obo/RO_0002212> "negatively regulates"  "577"
<http://purl.obolibrary.org/obo/GO_0001216> "DNA-binding transcription activator activity"  <http://purl.obolibrary.org/obo/RO_0002213> "positively regulates"  "49"
<http://purl.obolibrary.org/obo/GO_0030371> "translation repressor activity"    <http://purl.obolibrary.org/obo/RO_0002212> "negatively regulates"  "1"
<http://purl.obolibrary.org/obo/GO_0003674> "molecular_function"    <http://purl.obolibrary.org/obo/RO_0002211> "regulates" "27"
<http://purl.obolibrary.org/obo/GO_0001217> "DNA-binding transcription repressor activity"  <http://purl.obolibrary.org/obo/RO_0002212> "negatively regulates"  "2"
<http://purl.obolibrary.org/obo/GO_0001217> "DNA-binding transcription repressor activity"  <http://purl.obolibrary.org/obo/RO_0002213> "positively regulates"  "1"
vanaukenk commented 1 year ago

Cool. Thanks @balhoff It looks like computationally updating the MF - MF in general would be very helpful to curators, as well as the special case of the DNA-binding transcription activator activity. The others could just be addressed manually by curators.

vanaukenk commented 1 year ago

I updated the replace relations tsv to include 'directly activates' and 'directly inhibits' in the list of relations to update.

https://github.com/geneontology/noctua-models-migrations/pull/19

vanaukenk commented 1 year ago

@balhoff

Talking with @ukemi and @pgaudet this morning, we think it would be very helpful to have a list of the model ids that will be affected by the MF-MF relations replacement as well as the ProvidedBy value for each model.

The overall numbers seem a bit higher than we might have expected, so it'd be good to know exactly what will be touched if we do this.

This would also help us communicate with relevant groups about the upcoming changes.

Thank you!

balhoff commented 1 year ago

@vanaukenk here's a new query output: https://docs.google.com/spreadsheets/d/1wGwXS6Kc7zWvmCvp4HwC-8AjHNUYn12etLyun3osOZI/edit?usp=sharing

Let me know if you'd like any modifications.

vanaukenk commented 1 year ago

Fantastic! Thanks very much @balhoff

vanaukenk commented 1 year ago

From 2023-02-22 manager's call:

We'll aim to make these updates on the 2023-03-09 Noctua maintenance outage.

@balhoff @kltm

vanaukenk commented 1 year ago

@kltm @balhoff Just checking in about next week's data relations updates.

Is it still feasible to update the models on noctua-dev early next week so we have time to QC before Thursday evening's outage and updates on production noctua?

Thx.

kltm commented 1 year ago

We'll need the SPARQL and instructions, but can put it in on Monday or Tuesday at short notice.

balhoff commented 1 year ago

@kltm @vanaukenk sorry for the delay—I will try to get the SPARQL to you later today.

balhoff commented 1 year ago

@vanaukenk clarification about directly activates and directly inhibits... does the subject type matter at all for those? e.g., if subtype of 'translation activator activity'

balhoff commented 1 year ago

And what about regulates itself? Should it be updated to directly regulates and/or indirectly regulates?

vanaukenk commented 1 year ago

@balhoff

@vanaukenk clarification about directly activates and directly inhibits... does the subject type matter at all for those? e.g., if subtype of 'translation activator activity'

In theory, yes, the subject type would matter here, too. In practice, I don't know how many models actually use 'directly activates' and/or 'directly inhibits'.

If it's not too much trouble, I could take a look at any models using those two relations to see if we need to be more specific about the subject for the updates.

And what about regulates itself? Should it be updated to directly regulates and/or indirectly regulates?

Yes, for regulates without a directionality we would have to update to directly or indirectly depending on the subject type. I could also take a look at these models, but suspect there are more of them than the directly activates and directly inhibits models.

For future GO-CAM reports, it would be nice to know how many models use the regulates relation without directionality since we would want curators to review those and chose a more granular relation.

balhoff commented 1 year ago

@vanaukenk I queried and did not find any usages of directly activates or directly inhibits.

I think this is wrong; will confirm shortly.

balhoff commented 1 year ago

@vanaukenk here are the directly activates/inhibits models: https://docs.google.com/spreadsheets/d/1FRGgUliwUpxc7AUMwTKjAxmnuXslgwkmTEU9M1vZ7HE/edit?usp=sharing

It doesn't look like any of them use your special-case classes.

I initially didn't find any because there was no label for those relations (they are obsolete and weren't loaded).

vanaukenk commented 1 year ago

Thanks for the spreadsheet @balhoff

I've checked all the models and, with two exceptions that should be manually updated, I think we are okay to bulk update the models that use 'directly activates' and 'directly inhibits'.

The two that should be reviewed manually are: ~http://noctua.geneontology.org/editor/graph/gomodel:57c82fad00000403~ @ukemi DONE ~http://noctua.geneontology.org/editor/graph/gomodel:586fc17a00001662~ @vanaukenk DONE

vanaukenk commented 1 year ago

@balhoff @kltm

I'm checking models on noctua-dev.

So far, the updates for the negatively regulates relation between MFs look okay, but for the positively regulates relations, it looks like the existing relation isn't getting deleted and the new relation is just getting appended. Here's a view (the relations are overlapping a bit in the view, but both are there):

image

vanaukenk commented 1 year ago

Also, what happens to the evidence on the relation when we update? On noctua-dev it looks like we might be losing it :-(

vanaukenk commented 1 year ago

Here's a simpler example from http://noctua-dev.berkeleybop.org/editor/graph/gomodel:a1a2a3a402

image

balhoff commented 1 year ago

Thanks for checking the output—I updated the query to fix this problem.

vanaukenk commented 1 year ago

@balhoff @kltm

Hi - there's another thing I'd like to double-check.

In cases where we have a replacement for a more specific MF, e.g. DNA-binding transcription activator activity, RNA polymerase II specific (GO:0001228), it looks like we are getting two relations: one direct and one indirect.

Here's a model on production (http://noctua.geneontology.org/editor/graph/gomodel:5fa76ad400000000):

image

and here's what it looks like on noctua-dev (http://noctua-dev.berkeleybop.org/editor/graph/gomodel:5fa76ad400000000):

image

The 'indirectly positively regulates' relation is the one we want in this case.

vanaukenk commented 1 year ago

All of the models I checked this morning look okay.

Noting, though, that curators will need to manually review models with non-directional transcription factor MFs.

Thanks @kltm @balhoff