Closed kltm closed 6 years ago
Apparently some of these were in progress:
https://github.com/geneontology/noctua-models/pull/43#issuecomment-326085314 https://github.com/geneontology/syngo2lego/issues/1
HI Seth,
Will try to get these fixed today - at least for cases where this is under my control:
Contributors are supplied in the form SynGO:SynGO-bagni
, rather than the specced ORCID. AFAIK, there is no resolver for this.
Edges between individuals contain no date (required), contributor (required), or providedBy (optional, soon required) information.
Evidence individuals are shared between edges. This is not part of the current or future model (where publications may be shared, but not the evidence).
The model-level providedBy appears to be the string SynGO-VU
, rather than a CURIE or URL.
@ftwkoopmans - Maybe change this to a link to the new SynGO web site? Is this public yet?
The model titles are not human readable, making (re)discovery and use possibly difficult.
While not necessarily an issue, I think it may be worthwhile considering the use of special group (providedBy) identifiers for bulk import model data sets so that end users can filter for their inclusion and exclusion. These SynGO models would only be reliable filterable on their current title. I would recommend a specific or additional providedBy.
For this model, there is no occurs_in evidence.
I have not tracked it down yet, but there is something about the evidence in this model on the part_of
that prevents it from rendering correctly. It may be related to one of the issues above.
@kltm - Looking at a randomly chosen model (gomodel-59a9023b00000151) it looks like edges without evidence have the additional date and contributor annotations, but those with evidence do not. Is this correct?
Re-running with evidence individuals fixed. Consistent with previous comment, I haven't added annotations directly to edges that are redundant with those on evidence individuals attached to those edges. I'll upload these models to a new branch/pull request. I have a branch edit that adds these additional annotation axioms so can switch quickly if necessary.
Sorry to run back through your notes this way; I wish there was threading, maybe we should split these out into separate issues?
For this model, there is no occurs_in evidence.
Fixed, but note that it's potentially misleading as there's one set of evidence per model and not all evidence for primary annotation will apply for occurs_in edge.
Yes, this is interesting--I hadn't fully realized that is what was going on here. We currently have no way of stating evidence for a model or individuals in it. In fact, it was a conscious choice that we tried and backed away from early on. I would hazard that it would be better to only have evidence where is was certain and leave "blanks" rather than have over-zealous application of evidence (which could cause oddities later on). @cmungall , any thoughts here?
While not necessarily an issue, I think it may be worthwhile considering the use of special group (providedBy) identifiers for bulk import model data sets so that end users can filter for their inclusion >> and exclusion. These SynGO models would only be reliable filterable on their current title. I would >>recommend a specific or additional providedBy.
Makes sense. Just need to agree on one.
I believe that the only thing we'd need for this is something in groups.yaml (I believe you may already have it here: https://github.com/geneontology/go-site/blob/master/metadata/groups.yaml#L34) and either replace or add in addition to the current geneontology.org.
The model titles are not human readable, making (re)discovery and use possibly difficult.
They are designed so that SynGO curators can move easily between their curation tool and Noctua. I guess we could potentially overload the title field with SynGO model number + short title, but I don't think SynGO has the text for this. @ftwkoopmans ?
I noticed that there was text in the model-level comment annotation. Could that possibly be re-used or shortened?
Edges between individuals contain no date (required), contributor (required), or providedBy (optional, soon required) information.
@kltm Just to check: In addition to the date and contributor on the evidence on each edge (object property assertion), you want a dc:date {some date}, dc:contributor { some source} as annotation axioms on the each edge (object property assertion) ? Let me know ASAP as very easy to add and re-run conversion.
I believe that is correct: every object property assertion (edge) and individual within a model (and the model as well) gets date, contributor, and providedBy. There should be nothing without these annotations.
Generally tagging/invoking @cmungall to check the modeling above.
As I understand each syngo model contains exactly one GPAD assertion, so we take "one set of evidence per model" as being equivalent to a set of evidence on the "primary" edge in the model.
For title how about generating based on species + gene + primary term, plus the id?
As I understand each syngo model contains exactly one GPAD assertion, so we take "one set of evidence per model" as being equivalent to a set of evidence on the "primary" edge in the model.
One primary term & one evidence set per syngo model, but there can be multiple extensions. It is possible for there to be multiple syngo models in one noctua model (generated so that can be linked manually later), but in that case there will be one evidence set for each syngo model. Given that, should I remove evidence from the extension?
For title how about generating based on species + gene + primary term, plus the id?
Seems reasonable & easy to implement. @ftwkoopmans - any opinion? Model IDs will still follow SynGO (although full URI will have additional UUID).
I was able to clone this project and run it on command-line sbt using the provided SynGO_export_test.json file as input. Here's my attempt to see whether the most recent code addresses these issues:
Date and contributor missing on edges These are on individuals but not axioms. Should be easy to fix.
Evidence missing on occurs_in from extensions Looks like David fixed this.
“contributor” - pulling username (“SynGO-*”) from json instead of ORCID. Waiting on SynGO to provide ORCID’s in export file. Or provide username-to-ORCID mapping?
“providedBy” - Still using "SynGO-VU", which is hard-coded in syngo2lego. Need to pick something that’s existing in groups.yaml. This could get added to groups.yaml but still need to format into URI (like “www.geneontology.org”).
Model titles - Currently SYNGO###. From @cmungall: “For title how about generating based on species + gene + primary term, plus the id?” So format would be: {Species}{UniProtID}_{goTerm}SYNGO{###}? Ex. “Mouse_Q9EQZ7_GO:0016082_SYNGO_113”
Evidence references on part_of edge won’t display right: As mentioned in syngo2lego issue #1, I think this is due to evidence individuals using “source” to specify PMID instead of SEPIO:0000124 term. Although, the same set of evidence (and thus individuals) display correctly on the enabled_by edge. Also, the regenerated model displays part_of references correctly on my server even though the “source” property is still used for PMID. Not sure yet what change fixed this. In 1/18/18 commit to SimpleModel.scala, it looks like David switched from reusing the same axiom for different edges to creating distinct ones (by calling gen_annotations). I can change it to use SEPIO anyways, making sure it doesn’t break again after.
For the example model, I wasn't able to find input json containing SYNGO_1371. Is this in a repository somewhere?
Thanks!
-Dustin
@dustine32 I'm happy to look at any model PR--feel free to put a PR on dev and we can try and get those up. Unfortunately, I'm not sure of the location of any upstream input JSON, you'd probably have to ask @dosumis ?
Hi @dustine32,
Thanks for working on this. I may not have updated the source JSON file on the repo. Will fix that. Don't see any pull request yet, but more than happy to review.
I was talking to the SynGO group earlier this week, and they plan to submit another bolus of annotations in the next couple of weeks.
@dosumis Thanks! I haven't put in a PR to syngo2lego yet because I've yet to change any code. I was more just making sure that I could run it and wrap my head around the current state of things. I did put a PR on noctua-models/dev so @kltm could take a look at some of syngo2lego's output when I ran it. It looked like some of the issues were fixed when comparing models currently in noctua-models/dev and the regenerated models.
@thomaspd Great! I wonder if the file source for these annotations is in the same JSON format (could've even fixed the ORCID issue themselves!).
@dosumis Thanks! I haven't put in a PR to syngo2lego yet because I've yet to change any code. I was more just making sure that I could run it and wrap my head around the current state of things. I did put a PR on noctua-models/dev so @kltm could take a look at some of syngo2lego's output when I ran it. It looked like some of the issues were fixed when comparing models currently in noctua-models/dev and the regenerated models.
Yep. IIRC , I fixed everything everything except model titles and the SEPIO thing (which I wasn't aware of, did this change?)
@thomaspd Great! I wonder if the file source for these annotations is in the same JSON format (could've even fixed the ORCID issue themselves!).
Should be same JSON. There's also a quick check of the json they provide via json schema.
@thomaspd - would be good to emphasise that we want orcids this time.
OK, now I see it's the contributor that needs to be the orcid. I will talk to them.
Simple validation of their json using json_schema is here: https://github.com/geneontology/syngo2lego_data_conversion
travis checks any JSON dropped into the json folder against the schema in spec.
orcid goes in model.username, which is where they put initials now.
Last dataset provided by VU is here: https://github.com/geneontology/syngo2lego_data_conversion/blob/master/json/SynGO_export_2017-12-18_production.json
Rad! Thanks @dosumis and @thomaspd !
@dustine32 Looking at latest model batch, basing primarily on http://noctua-dev.berkeleybop.org/editor/graph/gomodel:SYNGO_1371 :
Great things:
occurs_in
evidenceThings that can still be worked on?
I think this is great and hope we can get these into production soon--these are a huge improvement in every way and I think easily meet all the required criteria for a "production" model. Thank you!
Assuming no action on the title, I guess the next step would be a PR against the master
branch. Once opened, the merge would have to be timed so that we could take the production site offline for a few minutes to rebuild.
Thanks much @kltm ! Just put in the PR to master. But now that I look at the title issue again, I suppose I may not have officially asked @ftwkoopmans what his opinion on title generation was. @thomaspd @huaiyumi I know this came up in our meeting this week as well. @cmungall 's suggestion from above was:
For title how about generating based on species + gene + primary term, plus the id?
@dustine32 A title would be an awesome bow on the PR package. I'd hope that anything would be an improvement on the current.
@lpalbou Hey Laurent, here's the ticket that tracked some things needing fixing in the SynGO models. Let me know what field ("oboInOwl" something) and what value should be added to the model files and I can plug away. An example would help for sure.
@dustine32 thinks this is cleared, as do I.
From https://github.com/geneontology/noctua-models/pull/54 Looking at http://noctua-dev.berkeleybop.org/editor/graph/gomodel:SYNGO_1371 as an example, apparent issues with SynGO models:
SynGO:SynGO-bagni
, rather than the specced ORCID. AFAIK, there is no resolver for this.SynGO-VU
, rather than a CURIE or URL.part_of
that prevents it from rendering correctly. It may be related to one of the issues above.Tagging @cmungall @dosumis