adsabs / montysolr

Solr for Astrophysics Data System
https://ui.adsabs.harvard.edu
Other
52 stars 19 forks source link

=author:"Wang, I" gets expanded to "Wang, Y" despite the equal sign #198

Open marblestation opened 1 year ago

marblestation commented 1 year ago

Solr (or the solr supervisor process) generates a transliteration:

cd /app/conf # in adsnest montysolr container
grep -i ^'wang\\,\\ i=' author_generated.translit
wang\,\ i=>wang\,\ y

These synonyms are merged with the ones that we hand curate (which do not contain any Wang reference).

When one uses the "=" syntax, we would expect that all synonyms and author name expansion would be skipped, but this is not the case.

JCRPaquin commented 1 year ago

I'm not sure what the purpose of author_generated.translit is or how it was originally created. Is it still necessary to maintain beyond our hand curated author name transliterations?

When one uses the "=" syntax, we would expect that all synonyms and author name expansion would be skipped, but this is not the case.

Is this what we expect from user behavior (e.g. sampled user queries and results) or from our design? It might be possible to prevent synonym/transliteration expansion for queries, but it's hard to say that's the right course of action without assessing user impact.

JCRPaquin commented 6 months ago

Query annotations, like the exact search annotation (=), currently don't propagate to the query code-- we can't access the query AST with the way the code is written today. I have a patch to enable propagating this information sitting on a branch, but it's not currently my priority so it'll be a while before I post another update on this issue.

The patch in a nutshell: when executing the Query objects generated for each subquery we provide a character stream. If we inject a wrapper that provides access to the AST node the Query object originated from, it'd be possible to traverse the character stream wrappers until you find the one providing the AST node reference.