TranslatorSRI / NameResolution

A service for finding CURIEs from lexical strings.
3 stars 2 forks source link

Consider different autocomplete blocklists for different MVPs #120

Open gaurav opened 7 months ago

gaurav commented 7 months ago

At the moment, the NameRes blocklist is implemented in the simplest possible way:

  1. A list of CURIEs is stored in a private GitHub repository.
  2. When starting, NameRes downloads the list of CURIEs and deletes them from its Solr database.

This means that the current blocklist applies to every user of that NameRes instance. As @sierra-moxon pointed out at https://github.com/NCATSTranslator/Blocklists/issues/12#issuecomment-1771737279, we might eventually want to support different blocklists for different MVPs, e.g. it would be nonsensical to ask "what treats stillbirth", but it is very reasonable to ask "what prevents stillbirth".

I think there are three ways to implement something like this:

  1. Deal with this in data modeling, so that everything of type biolink:Disease is known to be treatable (or we could create a new mixin -- biolink:Treatable) to make this distinguishable, and then only allow this type of concept to appear in autocomplete for MVP1. The biggest downside to this approach is that blocklist changes will require a full Babel rebuild and Solr reload, which currently takes several days.
  2. Make separate blocklists for each MVP. Instead of being blocked, the "blocked" terms would actually be marked as e.g. mvp_only: [mvp1] in Solr, and then NameRes can be called with an mvp=mvp1 field to control which filters are applied to the Solr queries. If no MVP is provided, the results are provided unfiltered.
  3. Don't do this in NameRes at all, but have the blocklists loaded by the UI or the Annotation Server. The UI would allow you to query e.g. "what treats stillbirth", but would be able to suggest that "what prevents stillbirth" might be the better query here.

I think option 2 would be the simplest to implement.