TranslatorSRI / NameResolution

A service for finding CURIEs from lexical strings.
3 stars 2 forks source link

Figure out how to reduce boosts on repeated terms #144

Open gaurav opened 4 months ago

gaurav commented 4 months ago

I'm trying to figure out how to solve this problem where e.g. https://name-resolution-sri-dev.apps.renci.org/lookup?string=brigatinib&autocomplete=true&offset=0&limit=10 returns UMLS:C4550665 as a superior result simply (?) because its preferred name contains brigatinib twice, giving it a score of 135.28891 vs 98.12439 for the second result.

One powerful tool we have is boost phrase, which allows us to say e.g. bp=names:human^2 will boost documents that have human in the names field. I'm not sure how to use this here but I'm looking into it. This may allow us to say stuff like clique_identifier_count[5 TO *]^10 to really boost cliques with more than 5 identifiers.

Chatting with ChatGPT about this raised two possibilities: