Per designs, accessible via toggle switch on the main search form
Interacts appropriately with document filters and sort
Highlights results in context
Questions
Do you see any problems with indexing/prefetching here? Does it seem slower than develop on your local machine after swapping out the solr config? It's been slow on the QA server but not any slower for me locally, maybe I'm missing something.
Additional notes
- `regex_search` is where the lucene regex query is built
- `transcription_regex` field is the only one searched in this mode
- `get_regex_highlight` is where the results are manually highlighted
- used a bit of fancy regex to get ~150 characters of context before and after the highlight, terminating at word boundaries
- team wanted to be able to search across multiple lines of transcription like PGPv3, so regex results in context do not display line numbers (also like PGPv3), and this was deemed an acceptable tradeoff
- updated the `clean_html` method to prevent extra whitespace getting added inside `` and `
` tags, as it otherwise breaks formatting for highlights
- fwiw: performance using django ORM was about the same as solr for me locally, and the team confirmed it performs well in testing, so no need to reimplement
unrelated to regex search:
- we're now getting matches across multiple transcriptions on the same document sometimes, so I added a little ellipsis to the template in case that happens
- also added a feature flag and template logic/css for displaying relevance score
In this PR
Per #1631:
Questions
develop
on your local machine after swapping out the solr config? It's been slow on the QA server but not any slower for me locally, maybe I'm missing something.Additional notes
- `regex_search` is where the lucene regex query is built - `transcription_regex` field is the only one searched in this mode - `get_regex_highlight` is where the results are manually highlighted - used a bit of fancy regex to get ~150 characters of context before and after the highlight, terminating at word boundaries - team wanted to be able to search across multiple lines of transcription like PGPv3, so regex results in context do not display line numbers (also like PGPv3), and this was deemed an acceptable tradeoff - updated the `clean_html` method to prevent extra whitespace getting added inside `` and `