Open shribigb opened 6 years ago
The issue is caused by the inner map file_flag_join
in the params
of the script.
If this inner map is copied in the context document and then modified to add new fields, this also modifies the original params
. It works when slicing is off because you always override the new field with its new value but when multiple slices execute the same script they modify this inner map concurrently which leads to a race condition.
The solution is to clone the map in the source
like this:
ctx._source.file_flag_join = new HashMap(params.file_flag_join)
We could force a deep copy of the params
before execution or make the inner map immutable to avoid this trappy behavior, @nik9000 WDYT ?
I think it'd be breaking to make those maps immutable but it is pretty tempting. It feels cleaner than making a copy every time though.
@nik9000 what should we do here? this feels like serious bug?
Either of the things I suggested should work to be honest. It might be better to make a script context for reindex and see if that solves the problem. It might. And if it doesn't we can build on it from there.
Pinging @elastic/es-distributed
@henningandersen and I discussed this issue during a triage session and we think this issue should be moved to the script label as the issue seems to be related to how the script is set up using a non thread-safe data structure in the script (maybe this should be forbidden?)
Pinging @elastic/es-core-infra (Team:Core/Infra)
This is an on-going issue with params, we should provide a way for users to access params in a safe way. Since we have metadata()
and field(<path>)
as replacements for ctx
, perhaps a fields-like params API would be worthwhile.
Elasticsearch version (
bin/elasticsearch --version
): 6.1.3JVM version (
java -version
): 1.8-56OS version (
uname -a
if on a Unix-like system): Ubuntu 16 LTSDescription of the problem including expected versus actual behavior:
When creating child documents using reindex api with slicing and painless scripting, I noticed that value for routing id was different from the parent id. When I re-tried without slicing I noticed expected behavior. To isolate the issue, I created single node single shard index.
Steps to reproduce:
If you notice, _routing and file_flag_join.parent have different values.
If I remove slicing and try again I dont see this behavior and I see consistency in the routing and parent values.