elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.52k stars 24.34k forks source link

Reindex API ignore the index name changed by script #69819

Open damienalexandre opened 3 years ago

damienalexandre commented 3 years ago

Elasticsearch version (bin/elasticsearch --version): 7.8.1 (but affect all version)

Plugins installed: []

JVM version (java -version): bundled

OS version (uname -a if on a Unix-like system): 20.04.1-Ubuntu

Description of the problem including expected versus actual behavior:

When using the Reindex API to move multiple indices from a remote to a local cluster, we can used a wildcard in the source index name parameter.

Then in the destination, we cannot but that's ok because a script can be used.

But if your script set the exact same index name from the remote index, it will completely by ignored and all yours documents are going to be sent to only and only one index, without any warning or error.

I had a hard time figuring out why my script wasn't working and narrowed it down to this:

https://github.com/elastic/elasticsearch/blob/2dbd59bbe167b1942c9725693cc1e600856d3554/modules/reindex/src/main/java/org/elasticsearch/index/reindex/AbstractAsyncBulkByScrollAction.java#L762-L764

If the index name set by a script is the same as the index name from the document, nothing is updated. That's probably good when the source is not a wildcard, but when it is, that's problematic!

Steps to reproduce:

Example adapted from the documentation:

PUT metricbeat-2016.05.30/beat/1?refresh
{"system.cpu.idle.pct": 0.908}
PUT metricbeat-2016.05.31/beat/2?refresh
{"system.cpu.idle.pct": 0.105}

POST _reindex
{
  "source": {
    "index": "metricbeat-*"
  },
  "dest": {
    "index": "metricbeat"
  },
  "script": {
    "lang": "painless",
    "inline": "ctx._index = 'metricbeat-' + (ctx._index.substring('metricbeat-'.length(), ctx._index.length()))"
  }
}

GET metricbeat*/_search

We expect only 2 indices with a document each (reindexed "in place").

We get 3 indices, a new metricbeat indice is created unexpectedly.

No warning or errors are triggered.

Ref #18654 #19662

elasticmachine commented 3 years ago

Pinging @elastic/es-distributed (Team:Distributed)

bhiravabhatla commented 3 years ago

Any update on this?

PATAPOsha commented 2 weeks ago

Very annoying bug. Spent couple of hours thinking that script simply doesn't work. Why can't we have same index name as source doc? At least it should throw an error instead of fallback to default index name provided in dest.index