geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
46 stars 89 forks source link

Check representation of IDA evidence code in GOLR - issue with autocomplete in Noctua #1221

Closed vanaukenk closed 4 years ago

vanaukenk commented 4 years ago

@tmushayahama @kltm

The IDA evidence code (ECO:0000314) does not appear in the Noctua Form 2.0 autocomplete when querying with the three-letter abbreviation, 'IDA'.

All other three-letter abbreviations when entered return the expected evidence code in the autocomplete list, although the correct code doesn't always appear at the top of the list.

Here is the full list of three-letter codes I tested (except for IEA which we don't allow for manual curation in the Noctua Form):

http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes

kltm commented 4 years ago

I propose that the issue is not that IDA does not appear, but rather that there are too many ECO terms with synonym "IDA" and the "real" one gets cut off:

http://noctua-golr.berkeleybop.org/select?defType=edismax&qt=standard&indent=on&wt=json&rows=10&start=0&fl=annotation_class,description,source,idspace,synonym,alternate_id,annotation_class_label,score,id&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&hl=true&hl.simple.pre=%3Cem%20class=%22hilite%22%3E&hl.snippets=1000&fq=document_category:%22ontology_class%22&fq=is_obsolete:%22false%22&fq=source:%22eco%22&facet.field=source&facet.field=idspace&facet.field=subset&facet.field=is_obsolete&q=IDA*&qf=annotation_class^3&qf=annotation_class_label_searchable^5.5&qf=description_searchable^1&qf=synonym_searchable^1&qf=alternate_id^1

Note that they all have synonym "IDA". Looking at the ECO source, a lot of these are "RELATED", not "EXACT", with the form "IDA: blah blah blah". I'm not sure we can really code around that quickly as I believe AmiGO/GOlr currently has no sense of synonym scopes. It may be something to look at in the ontology? @cmungall

vanaukenk commented 4 years ago

Note that ECO has removed the three-letter GO code prefix from their synonyms and this update will be reflected in the 2020-02-07 release.

https://github.com/evidenceontology/evidenceontology/issues/247

Once ECO is updated on our end, we need to test the autocomplete in Noctua to make sure this fixes the signal-to-noise problem that was drowning out the search for 'IDA' ECO:0000314.

kltm commented 4 years ago

@vanaukenk Hey, that's really great news! We can trigger and deploy a build after the seventh and see how it works for us.

kltm commented 4 years ago

@vanaukenk Now ready to test.

vanaukenk commented 4 years ago

@kltm - thank you!

I've tested the Evidence Code field search on Noctua-dev for each of the Noctua form implementations (Create Activity, BP only, CC only) and the graph editor.

Searching with 'IDA' now returns the correct evidence code without also returning all of the children of IDA (ECO:0000314) from ECO. Excellent!

I note, though, one difference in the search results between the form and the graph: searching with 'ida' in the form also returns HDA perhaps due to a string match in the comment field for HDA?

 Comment: When using the HDA evidence code, the guidelines for IDA should be adhered to 
 (http://geneontology.org/page/ida-inferred-direct-assay)

Note that this behavior holds for other 3-letter evidence codes that have a corresponding HDA code, i.e. IGI/HGI, IMP/HMP, IGI/HGI, and IEP/HEP.

If need be, we could create a separate ticket to standardize the above behavior, but I think the problem that originally generated this ticket has been addressed.

Here are some screenshots of what I see:

image

image

@tmushayahama

vanaukenk commented 4 years ago

Closing this ticket as the original issue has been fixed with help from ECO.
Will open a new ticket in Noctua to track that we need to unify autocomplete behavior between the form and the graph editors.