flexion / ef-cms

An Electronic Filing / Case Management System.
23 stars 10 forks source link

SPIKE: Elastic Search Special Character Usage #8630

Closed cholly75 closed 3 years ago

cholly75 commented 3 years ago

As a DAWSON project member, I need to understand the relationship between 'special characters' (. - , ' ; : () * & ) and how Elastic Search parses these characters in a query and a content context so that I can effectively design and code search.

Timebox: 4 hours

Some reference to help:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html#supported-flags

Pre-Conditions

Acceptance Criteria

Team has an understanding of how ES interprets special characters as inputs into search query Team has an understanding of how ES interprets special characters in content index Team has an understanding of how to manipulate and work with special characters in ES coding/configuration. Document what we learn - start in story then markdown file in documents directory

Mobile Design/Considerations

IRS API Considerations

Do these changes impact the IRS API?

Security Considerations

Notes

Tasks

Definition of Done (Updated 4-14-21)

Product Owner

UX

Engineering

mrinsin commented 3 years ago
  1. Using the whitespace tokenizer in a custom analyzer would return results for search keywords containing special characters, as opposed to ignore special characters (like it is currently built to do).
  2. When searching a field that is NOT analyzed, reserved characters can be escaped with a backslash, or if an exact match on the keyword is required, the keyword can be wrapped in double-quotes. (For more information on when analyzers are used vs not, this link is useful) Example:

image

In this case, the documentContents field is analyzed using a custom analyzer, whereas the docketNumberWithSuffix field is not.

  1. Depending on the business need, we may need a way to use certain reserved characters as they are interpreted by ES (eg -,!). If it is something the court wants/needs, there are tools to do so (eg. specify in query or analyzer)

Reserved chars in ES simple query string: image