Open v-echo opened 3 years ago
Here is a reproduction scenario:
# (1) create an index with a text property
curl -X PUT "localhost:9200/es-66159?pretty" -H 'Content-Type: application/json' -d'
{
"mappings": {
"properties": {
"content": { "type": "text" }
}
}
}
'
# (2) index a document
# store as doc.json
{
"content": "['4Q6109372363778', '1f21', '2e01928391087509127405123521353526798', '4:Q6109372363778', '4/Q/6109372363778', '4 Q 6109372363778', '1 e 231']"
}
curl -X POST "localhost:9200/es-66159/_doc?pretty" -H 'Content-Type: application/json' --data-binary "@doc.json"
# (3) issue regexp query
curl -X GET "localhost:9200/es-66159/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"regexp": {
"content": {
"value":"[0-9](\\/|\\:| |)[^aboiyzABOIYZ0-9\\[-\\` -@](\\/|\\:| |)[0-9]{2,}"
}
}
}
}
'
Pinging @elastic/es-search (Team:Search)
Pinging @elastic/es-search-relevance (Team:Search Relevance)
Elasticsearch version (
bin/elasticsearch --version
): 7.6.2 JVM version (java -version
): Embedded OS version (uname -a
if on a Unix-like system): Windows Server 2016 Description of the problem including expected versus actual behavior: When executing the following query:The following error is returned:
Testing the same expression on https://regex101.com/ it renders and matches correctly. Reading the docs at https://www.elastic.co/guide/en/elasticsearch/reference/current/regexp-syntax.html it is unclear what the problem is. However, on further testing it seems like if you remove or escape the last | inside the groups it parses, though naturally it doesn't match correctly anymore, since the meaning of the symbol changes.
The 'content' field mapping is simple, almost default:
For reference/testing, what the expression should match are UTM coordinates. Taken from here: