Closed markharwood closed 2 years ago
Pinging @elastic/es-search (:Search/Search)
Given our current roadmap and near-term focus, we are unlikely to get to this request anytime soon. This issue has been open for 2 years with no action taken, and from our triage, it doesn't look like it blocks any critical feature or implementation. We'll go ahead and close it but please re-open or add a comment if you are awaiting its resolution.
Regular expression queries are likely to become much more popular with the forthcoming introduction of the wildcard field. In Lucene we opened up access to some of the internals to get detailed parser information and in Kibana we have proposed a UI to help users write and test regular expressions. This issue is intended to provide the glue between these two pieces of work by exposing an API that provides a structured representation of a regex string. The input is a plain string which is the regular expression and the output is a structured JSON object representing the parse tree of RegExp objects and the type of expression each node represents.
It's not clear to me that this falls neatly into either the existing validate or explain APIs for queries. My assumption is that the break-down of a regex should not be tied to a particular elasticsearch index or field - it's only about explaining the internal complexity of a given regex string. It's possible that we should also provide an API to test a given regex against a given string that clients want to match. Clients like Kibana could then supply the test string from any source.
Why can't users just use existing online regex testing tools?
Online testing tools like https://regex101.com/ are useful (and to be emulated) but the Lucene regex support is not complete and it is useful to have a tool that properly represents the features that are available.
API outline
A new
_regex_test
endpoint would take a regex and an optional valueThe result could look like this:
The parse tree represents a parsed form of the regular expression logic.
New query type - equivalent of interval queries for regex
The parsed regex outputs like the above example could potentially be used as a new query type which would create
AutomatonQuery
objects when executed. End users wouldn't be expected to type this JSON but it's not hard to imagine a Kibana UI that takes a regex string as fast form of data entry but then renders the JSON response graphically for review and further editing.The advantages of working with a graphical representation are: 1) The user can review the logic of the regex parser's interpretation (for years Lucene silently failed on the
\w
syntax). 2) We can expose existing backend features likeignore_case
with a checkbox that we've failed to add to the simple string syntaxes like KQL and Lucene query_string. (Simple query string expressions can't scale in complexity: requires too many "special" control characters). 3) We can expose Automaton features that go beyond regex syntax e.g. fuzzy.Conclusion
Complex character sequence matching is a common practice in the security field and warrants a better UX than the cryptic and black-box approach of writing regex strings. These elasticsearch changes would lay a foundation for making big improvements in the user experience.