geneontology / noctua

Graph-based modeling environment for biology, including prototype editor and services
http://noctua.geneontology.org/
BSD 3-Clause "New" or "Revised" License
36 stars 13 forks source link

Autocomplete behavior for evidence codes #234

Closed vanaukenk closed 8 years ago

vanaukenk commented 8 years ago

Hi,

Curators are probably most familiar with the three-letter evidence code abbreviations. Typing those in the box brings up the commonly used GO ECO codes for some, but not all, evidence codes. For example, IMP, IGI, and IPI (with a space after the abbreviation) all return the appropriate code in the list, but IDA and IEP, don't. Do we want the autocomplete to give priority to matching synonyms in ECO, or should we just encourage curators to search with the text string of the name or the ECO ID?

Thx. --K.

cmungall commented 8 years ago

I suspect this is a solr tuning issue. It's odd that 'transport assay evidence' comes up when you type IDA. I suspect it's something to do with the ':' in the synonym

id: ECO:0000134 name: transport assay evidence synonym: "IDA: transport assay" RELATED []

vs

id: ECO:0000314 name: direct assay evidence used in manual assertion synonym: "IDA" RELATED [GOECO:IDA] synonym: "inferred from direct assay" EXACT [GOECO:IDA]

We have an open ticket about this in the amigo tracker, as this is shared code https://github.com/geneontology/amigo/issues/131

The solution is to switch to solr5... but we don't intend to do that until next year.

What to do in the interim? Unfortunately it's not very obvious with ECO how the names map to the codes. For example if you type full name of IDA, "inferred from direct assay" is comes up with:

 direct assay evidence used in manual assertion/ECO:0000314

A curator might think, "no I don't want that, I want IDA". But in fact that is IDA.

I think for the meeting we just have to give people a table that maps the code to the full name in ECO.

ECO:0000245 RCA computational combinatorial evidence used in manual assertion ECO:0000247 ISA sequence alignment evidence used in manual assertion ECO:0000303 NAS non-traceable author statement used in manual assertion ECO:0000304 TAS traceable author statement used in manual assertion ECO:0000314 IDA direct assay evidence used in manual assertion ECO:0000315 IMP mutant phenotype evidence used in manual assertion ECO:0000316 IGI genetic interaction evidence used in manual assertion ECO:0000317 IGC genomic context evidence used in manual assertion ECO:0000318 IBA biological aspect of ancestor evidence used in manual assertion ECO:0000319 IBD biological aspect of descendant evidence used in manual assertion ECO:0000320 IKR phylogenetic determination of loss of key residues evidence used in manual assertion ECO:0000320 IMR phylogenetic determination of loss of key residues evidence used in manual assertion ECO:0000321 IRD rapid divergence from ancestral sequence evidence used in manual assertion ECO:0000353 IPI physical interaction evidence used in manual assertion ECO:0000501 IEA evidence used in automatic assertion

A key rule is that if you're manually entering it, the full name in ECO will end in "used in manual assertion"

If people want to use more specific evidence types, this table shows how GOREFs deepen evidence types: http://purl.obolibrary.org/obo/eco/gaf-eco-mapping.txt

On 3 Dec 2015, at 5:14, vanaukenk wrote:

Hi,

Curators are probably most familiar with the three-letter evidence code abbreviations. Typing those in the box brings up the commonly used GO ECO codes for some, but not all, evidence codes. For example, IMP, IGI, and IPI (with a space after the abbreviation) all return the appropriate code in the list, but IDA and IEP, don't. Do we want the autocomplete to give priority to matching synonyms in ECO, or should we just encourage curators to search with the text string of the name or the ECO ID?

Thx. --K.


Reply to this email directly or view it on GitHub: https://github.com/geneontology/noctua/issues/234

kltm commented 8 years ago

This is indeed a bit of a mapping issue and a bit more of the issue Chris mentioned. I'm not a fan of duplicate issues tracking, so since this is tagged for searching, I'm going to close out.

cmungall commented 8 years ago

Now we have nowhere to indicate that this is an unresolved issue. Even if the resolution is to have a web page showing people the mapping and they consult that, it should be noted.

On 3 Dec 2015, at 11:07, kltm wrote:

Closed #234.


Reply to this email directly or view it on GitHub: https://github.com/geneontology/noctua/issues/234#event-481813495

vanaukenk commented 8 years ago

I've added the mappings above to the end of the LEGO Google doc where David also put a Glossary of terms. Having the info in more than one place would not be a bad idea, though.

kltm commented 8 years ago

Multiple places brings up the risk of drift. What is the current best location of the mapping documentation?

ukemi commented 8 years ago

It's not really mapping. The originals live in the ECO ontology. Maybe a link to Ontobee in the LEGO documentation?

kltm commented 8 years ago

Okay, my understanding is that ECO is the real deal, with codes being legacy, and not completely bijective with ECO. I was under the impression that there was some documentation to clarify that cases where there is no clear map?

ukemi commented 8 years ago

Ah. I see what you mean. Why not just get people used to using ECO and away from codes? When I work in Noctua, I put on my ECO hat. Although at first I was a bit confused by the 'used in manual assertion' codes, once Marcus explained them in DC, it became pretty clear.

cmungall commented 8 years ago

+1 to that, but we don't want ontological fussiness to get in the way of adoption. It may take some time to wean curators off codes. It may be better to do this as part of a larger rethink about how capturing assay results fits into GO curation

On 3 Dec 2015, at 11:34, ukemi wrote:

Ah. I see what you mean. Why not just get people used to using ECO and away from codes? When I work in Noctua, I put on my ECO hat. Although at first I was a bit confused by the 'used in manual assertion' codes, once Marcus explained them in DC, it became pretty clear.


Reply to this email directly or view it on GitHub: https://github.com/geneontology/noctua/issues/234#issuecomment-161756408