TranslatorSRI / NameResolution

A service for finding CURIEs from lexical strings.
3 stars 2 forks source link

Searching for BRCA1 in autocomplete=true mode gives a lot of bad matches #149

Open gaurav opened 1 month ago

gaurav commented 1 month ago

See https://name-lookup.ci.transltr.io/lookup?string=BRCA1&autocomplete=true&offset=0&limit=10 (in this case, we're effectively searching for "BRCA1*")

It works fine if you turn off autocomplete (https://name-lookup.ci.transltr.io/lookup?string=BRCA1&autocomplete=false) or filter to Gene (https://name-lookup.ci.transltr.io/lookup?string=BRCA1&biolink_type=Gene), but since we have a concept with a preferred name that is exactly "BRCA1", we should have some way to prioritize that match.

Previously we would search for "(BRCA1) OR (BRCA1*)", but we removed that because it unfairly promoted terms that had the searched term duplicated (i.e. we would prioritize "bone bone" over "bone", see #142). Hopefully we can find a better way of prioritizing preferred names over synonyms or of boosting terms with multiple nodes, and don't need to bring that back. Another option would be to add an exact match index to either preferred_name or LOWER(preferred_name) and then to really boost that -- it might be worth trying that out on NameRes-Loading and see how much bigger it makes the database backup.