geneontology / noctua-models

This is the data repository for the models created and edited with the Noctua tool stack for GO.
http://noctua.geneontology.org/
Creative Commons Attribution 4.0 International
10 stars 3 forks source link

Load remaining ZFIN Models into noctua #170

Closed sierra-moxon closed 2 years ago

sierra-moxon commented 3 years ago

From @kltm It looks like on the next iteration, the ZFIN group, species, and a different contributor might be good additions--they are a little hard to find right now.

kltm commented 3 years ago

Also tagging @dustine32 for this location. Please feel free to move this to any convenient repo.

sabrinatoro commented 3 years ago

@dustine32 - I tested the ZFIN models, and here are the issues I found (note that some of these might have already been reported)

1) the contributor is incorrect for all models: it should be “ZFIN” or something like “GOC:zfin_curators”

2) “NOT” annotations are not reported as “not”, they are reported as “regular/positive” annotations Note: I have an example in which there are both “yes’ and “not” annotations to the same term. For example: ZFIN:ZDB-GENE-000616-1

3) ZFIN-PUB-### id which does not have a corresponding PMID are not reported in the model reference

4) “binding” term with IPI : the gene in the “with” field is displayed in the model as “has input” (which is correct), but the “with” field is missing in the evidence in the model.

5) “binding” term with IPI: when the ID is in the “with” field is an ID which is not in the GPI, this ID was not reported as “has input” Could we show this ID in the “has input” even though this ID might not be in Neo yet?

6) same as 5, but the ID is in the “with” field refers to a non ZFIN gene

7) there are 2 dates in some of the “boxes” in which the evidences have different dates

examples can be found here: https://docs.google.com/spreadsheets/d/1o5Wa0T16RR2WF-bI_JqTsczjiszHki4vnp49fM1_mpg/edit?usp=sharing

Note: I checked the term-GP, the date, model name,... and everything looks ok.

dustine32 commented 3 years ago
  1. the contributor is incorrect for all models: it should be “ZFIN” or something like “GOC:zfin_curators”

This can actually be handled in the input ZFIN GPAD file by using the annotation properties column (e.g. contributor-id=GOC:zfin_curators). If this is populated in the GPAD, the conversion code will set it in the model.

  1. “NOT” annotations are not reported as “not”, they are reported as “regular/positive” annotations Note: I have an example in which there are both “yes’ and “not” annotations to the same term. For example: ZFIN:ZDB-GENE-000616-1

There's an open issue here: https://github.com/geneontology/gocamgen/issues/10 I have some code for this that I still need to test. I'll use your example model here.

  1. ZFIN-PUB-### id which does not have a corresponding PMID are not reported in the model reference

I think this is a consequence of https://github.com/geneontology/gocamgen/issues/31 but let me know if you found an evidence that's completely missing any pub reference.

  1. “binding” term with IPI : the gene in the “with” field is displayed in the model as “has input” (which is correct), but the “with” field is missing in the evidence in the model.
  2. “binding” term with IPI: when the ID is in the “with” field is an ID which is not in the GPI, this ID was not reported as “has input” Could we show this ID in the “has input” even though this ID might not be in Neo yet?
  3. same as 5, but the ID is in the “with” field refers to a non ZFIN gene

It'd probably be easiest to just discuss these on the next call with @ukemi and @vanaukenk.

  1. there are 2 dates in some of the “boxes” in which the evidences have different dates

Soon to be fixed! I have some new code (wasn't ready for this load) that will only use the max date here.

Thanks again for the feedback @sabrinatoro!

sierra-moxon commented 3 years ago

thanks @dustine32 , this is great! :) And thanks for pointing me to the code to run these models thru ShEx as well! I've attached the summary output to this case. For number1 above - the spec says contributor_id should be an ORCID - its ok to be a curie as per above, right?

activity_report.txt explanations.txt main_report.txt

dustine32 commented 3 years ago

@sierra-moxon Yep! GOC:id curie is fine per some of our recent calls and also this ticket. The value in the GPAD's contributer-id property just needs to match a uri value in users.yaml in order for the Noctua landing page search and display to work.

sierra-moxon commented 3 years ago

gorules_report.json.gz

dustine32 commented 3 years ago
  1. ZFIN-PUB-### id which does not have a corresponding PMID are not reported in the model reference

I just found an example of this (ZFIN:ZDB-TRNAG-011205-6) where it's completely dropping the reference due to some dumb bug that I can fix with just one line.

sierra-moxon commented 3 years ago

activity_report.txt explanations.txt main_report.txt gorules_report.json.gz

These are the ShEx check outputs from the latest round of models for your review @sabrinatoro. These should be available for review on noctua-dev.

sierra-moxon commented 3 years ago

gpad2.0.zfin_final.gz @sabrinatoro - gpad used

sabrinatoro commented 3 years ago

@dustine32. Here is my report of the issues I found after the latest round. All of them (except for the first one) are the same as previously (you are probably still working on these-but still added them here for completeness). Since there is nothing new, the same examples can be use (however let me know if you want new examples).

dustine32 commented 3 years ago

@sabrinatoro Thanks for re-testing! Sorry the results haven't change. I'm wondering if some of these were just due to the ontobio code being out of date. That weird contributor bug fix should have been merged in ontobio/master branch since 2021-02-25 (specifically, this commit https://github.com/biolink/ontobio/pull/529/commits/678ed6d8fbcde5a7a808defb6c22216119611b85). @sierra-moxon Could you check that this commit is in your ontobio git log? I was at least able to get the right contributor format when regenerating model ZFIN:ZDB-TRNAG-011205-38 with the current ontobio/master code.

For the NOTs, the missing references, and the multiple date issue, those fixes are in but not yet merged to master. @sierra-moxon If you want, you can pull this latest code from gocamgen branch and regen to see if these get fixed as well. As a good practice, I've been trying to wait till gocamgen changes get merged to master before generating the full MOD loads, but I often get too excited (as in this PR for WB/MGI).

Sorry for all the "branch" confusion! It keeps things spicy I guess.

sabrinatoro commented 3 years ago

@dustine32 I re-tested the models, and I couldn't find any issues. Thank you !

sierra-moxon commented 3 years ago

@sabrinatoro and I talked to ZFIN today - they are ready to start testing the round trip and want to be ready for ZFIN models release with the June GO release. @kltm kindly volunteered to set up a branch of the pipeline with ZFIN models that we could get started on. :)