Open ValWood opened 6 years ago
Weighting is a good idea.
Would a universal weighting scheme for end-users and curators make sense, or do you think we'd want to weight differently?
we do some weighting in Canto, on a case by case basis.
These ones are ticky, I think because; i) they can only be "related synonyms" since they are used in other contexts (we give more weight to exact synonyms) ii) the actual term name is necessarily longer, because it needs to be qualified signalling receptor activity
It's almost as though we need a label to give "community usage" higher priority.
If the synonym is an exact match to a string (rather than a substring) it should also score more highly. I thought we already did that in Canto but maybe that also needs to be exact. Perhaps "exact match" to any synonym should be prioritised over any "sub-string match".
@tonysawfordebi @kimrutherford may have insights
If the synonym is an exact match to a string (rather than a substring) it should also score more highly. I thought we already did that in Canto but maybe that also needs to be exact.
In Canto exact matches appear slightly higher in the search than prefix matches. So if you search for "splice" a term name containing "spliceosomal" will generally appear lower than a name containing "splice site ...".
In Canto for "receptor activity" you get
and in quickGO
of course in QuickGO "signalling receptor activity" will be somewhere down the list, but for a common parent grouping term you would expect it to be more prominent.
It's amazing how a demo makes you realise what strange work arounds you employ to find things. I imagine hardly anyone would be able to locate this term unless they knew the ontology well, or the precise term name.
For Amigo a search on "receptor activity" does not find "signaling receptor activity" in the first 100 hits...
Re: weighting exact over other syns, this makes sense, I do this when doing any kind of NER task. I wish more NLP people would make use of these.
GO is generally quite consistent here, but it bugs me when we make "apoptosis" a narrow syn of "apoptotic process", this is maybe justified from a strict ontological point of view but from a pragmatic term matching point of view it's not ideal. Also, I know some NLP people throw out non-exact synonyms, thus throwing out "apoptosis"...
I guess what GO does should be ontologically correct. Maybe we need another synonym type, meaning "not exact, but the community mean exactly the same thing when they use this" (or a flag on any synonym type because they can be broad, narrow, or related). These are the ones that need to percolate to the top of the search.
OBO format does have "synonym types", as distinct from the exact/related/etc. synonym scopes. I don't know what OWL would make of that, but if it's OWL-able there could be a type for "community usage" and it could confer higher weight in a search.
Yes, it all gets represented in OWL. Both scope and extensible types are represented as "axiom annotations" on the basic synonym triple.
I'd go the other direction - make the scope exact, and include an extra synonym type saying "not quite exactly the same thing in a fussy ontological sense". Make things easier for the default user.
I was going through some annotation with a new curator and I noticed that some terms that are really difficult to locate.
Take for example the "signalling receptor activity" to describe a canonical "receptor activity" TO find this term you need to Know to use "signalling receptor activity" search term because you can't reach it with "receptor activity"- you only locate descendants.
This is the case in all tools because, probably "receptor activity" is only a related synonym (since it is used in different contexts).
For me to find the terms I have a work around go to the QuickGO graph and then locate the parent, clearly this isn't ideal, especially for new people. Is there any way to "Weight" terms like this in tools?