RegEx search (#1631) - Githubissues

In this PR

Per #1631:

Regex search capability
- Per designs, accessible via toggle switch on the main search form
- Interacts appropriately with document filters and sort
- Highlights results in context

Questions

Do you see any problems with indexing/prefetching here? Does it seem slower than develop on your local machine after swapping out the solr config? It's been slow on the QA server but not any slower for me locally, maybe I'm missing something.

Additional notes

- `regex_search` is where the lucene regex query is built - `transcription_regex` field is the only one searched in this mode - `get_regex_highlight` is where the results are manually highlighted - used a bit of fancy regex to get ~150 characters of context before and after the highlight, terminating at word boundaries - team wanted to be able to search across multiple lines of transcription like PGPv3, so regex results in context do not display line numbers (also like PGPv3), and this was deemed an acceptable tradeoff - updated the `clean_html` method to prevent extra whitespace getting added inside `` and `
` tags, as it otherwise breaks formatting for highlights - fwiw: performance using django ORM was about the same as solr for me locally, and the team confirmed it performs well in testing, so no need to reimplement unrelated to regex search: - we're now getting matches across multiple transcriptions on the same document sometimes, so I added a little ellipsis to the template in case that happens - also added a feature flag and template logic/css for displaying relevance score

Princeton-CDH / geniza

RegEx search (#1631) #1588

In this PR

Questions

Codecov Report