elastic / elasticsearch-net

This strongly-typed, client library enables working with Elasticsearch. It is the official client maintained and supported by Elastic.
https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/index.html
Apache License 2.0
3.57k stars 1.15k forks source link

Encode function for TermsInclude regexes. #8302

Closed lanwin closed 1 month ago

lanwin commented 1 month ago

Is your feature request related to a problem? Please describe. We have a list of TermsInclude (currently on V7) and we want to get all terms that start with our word (so basically word+.*). The problem is that these words can contain regex chars like . are ? and we need to encode them correctly. From what I can see there is no method to do that.

Describe the solution you'd like I think there should be a method to encode regex chars correctly the use inside ElasticSearch.

Describe alternatives you've considered We are currently ding this Regex.Escape(word).Replace("\"", "\\\"") but since dotnet Regex and ElasticSearch regexes are not identical, this can not be an 100% match.

Additional context Add any other context or screenshots about the feature request here.

flobernd commented 1 month ago

Hi @lanwin,

thanks for the feature request, but I don't think this is something the should be included in the library.

This page contains a list of all characters that must be escaped: https://www.elastic.co/guide/en/elasticsearch/reference/current/regexp-syntax.html

Should be straightforward to implement a custom escape function:

string RegexEscape(string input)
{
    var re = new Regex("([\\.\\?\\+\\*\\|\\{\\}\\[\\]\\(\\)\\\"\\\\\\#\\@\\&\\<\\>\\~])", RegexOptions.Compiled | RegexOptions.IgnoreCase);

    return re.Replace(input, match => "\\" + match.Value);
}

The regex should be stored in a static member using the [GeneratedRegex] attribute for better performance. Using Regex.Replace is probably not the fastest way either, but should give you a basic idea.