EBISPOT / zooma

ZOOMA - Optimal Ontology Mapping Application. http://www.ebi.ac.uk/spot/zooma.
Apache License 2.0
24 stars 11 forks source link

Poor matches against SO, likely due to use of underscores #57

Open cmungall opened 3 years ago

cmungall commented 3 years ago

Searching for splice site, I would expect HIGH confidence matches for SIO and SO, as these exactly match the main name:

$ curl -L -s 'http://www.ebi.ac.uk/spot/zooma/v2/api/services/annotate?propertyValue=splice+site' | jq '.[] | .confidence, .semanticTags, .annotatedProperty.propertyValue'
"MEDIUM"
[
  "http://semanticscience.org/resource/SIO_010451"
]
"splice site"
"MEDIUM"
[
  "http://purl.obolibrary.org/obo/SO_0000162"
]
"splice_site"

SO uses underscores in names (arguably a bug in SO, which I may be partly to blame for.. but it is how it is), and indeed if I search using underscores:

$ curl -L -s 'http://www.ebi.ac.uk/spot/zooma/v2/api/services/annotate?propertyValue=splice_site' | jq '.[] | .confidence, .semanticTags, .annotatedProperty.propertyValue'
"GOOD"
[
  "http://purl.obolibrary.org/obo/SO_0000162"
]
"splice_site"

However, a poor user is not likely to know to use underscores when searching SO

Recommendations/questions:

  1. treat underscore identical to space when both indexing and searching
  2. the first hit should return a high confidence match to SIO
bgood-d4c commented 3 years ago

@cmungall I think your indexing suggestion makes sense. (Probably same for -). If you tell zooma to look in the SIO specifically you can get it to give you a GOOD hit for 'splice site'.

cmungall commented 3 years ago

Hi Ben! In this case I have other terms for which I need the deeper SO hierarchy.. but not everything is expected to be in SO so i cant convert my input to underscores... could do two queries i guess :(

On Thu, Feb 18, 2021, 17:11 bgood-d4c notifications@github.com wrote:

@cmungall https://github.com/cmungall I think your indexing suggestion makes sense. (Probably same for -). If you tell zooma to look in the SIO specifically you can get it to give you a GOOD hit for 'splice site'.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EBISPOT/zooma/issues/57#issuecomment-781744073, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMONBVFC6AC7BZWIFZM3S7W3FTANCNFSM4X3NE4PQ .

henrietteharmse commented 3 years ago

If Zooma can be more resilient in spite of potential human error that will be helpful.