TranslatorSRI / NameResolution

A service for finding CURIEs from lexical strings.
MIT License
3 stars 2 forks source link

Handle possessives and plurals better #10

Open cbizon opened 4 years ago

cbizon commented 4 years ago

"Parkinson's" returns hits, but "Parkinsons" does not. Other punctuation like commas or hyphens appear to be ignored by solr. Can we also ignore apostrophes?

Also see Alzheimer. Searching for "Alzheimer's" or "Alzheimer" returns over 20 hits, including things from CHEBI as well as MONDO. "Alzheimers" returns 1. Things like this will be handled inconsistently across naming schemes, so we should probably do some work here to make sure it doesn't matter.

gaurav commented 1 month ago

I like our current "solution" to this problem, which is that we now have so many synonyms that we're likely to get both "Parkinson disease" and "Parkinson's disease" as exact synonyms, so we don't need to either (1) remove the quotes during synonym loading and querying, or (2) make the search a little bit fuzzy so that single character insertions are ignored. In theory this would allow us to better support terms where single quotes should not show up, but I don't think there's a lot of those.

On the other hand, reducing the number of synonyms would help make the Solr database smaller, which would be nice.