Closed voran closed 1 year ago
Looking through the file history it seems like this might be adressed by https://github.com/elastic/elasticsearch/pull/95114. Fingers crossed for a new release soon as this cauing a production issue for us.
Also, it may be a good idea to put this as a known issue for version 8.7.1 to prevent others from ending up where we are.
Thank you for reporting the issue. I am closing this issue, as it appears that the problem was fixed in #95114.
Also, it may be a good idea to put this as a known issue for version 8.7.1 to prevent others from ending up where we are.
I agree, I opened https://github.com/elastic/elasticsearch/pull/96448.
Elasticsearch Version
8.7.1
Installed Plugins
none
Java Version
bundled
OS Version
docker.elastic.co/elasticsearch/elasticsearch:8.7.1
Problem Description
We have cluster with about 20 nodes (4 vCPU, 32GB RAM) on Google Cloud running version 8.4.2 and an index with parent-child and about 6TB of data (60 shards, 1 replica) that was indexed using an older version (8.3.2).
We recently upgraded to version 8.7.1 via rolling restart, and when we try to reindex, we see consistent errrors in some shards as follows:
java.lang.ArrayIndexOutOfBoundsException: arraycopy: length -3 is negative
(see logs for full traces).Cluster is green and there is no other indication of shard corruption. We were able to reindex from this same index successfully before the upgrade from 8.4.2 to 8.7.1.
In all cases, it's always "length -3 is negative".
Both stack traces refer to
RecyclerBytesStreamOutput
, and this is something that was changed in this version (https://github.com/elastic/elasticsearch/pull/95036). Without being experts, we suspect that this PR introduces a bug which causes this particular failure.Steps to Reproduce
I cannot share a minimal dataset to reproduce this yet, but here is the output of
GET /<index>
including the index settings and mapping.Presumably, if you:
You will get the same error for some of the reindex slices. If you run the reindex without slicing, it would still fail with the same error.
elasticserach uses these settings:
Logs (if relevant)
1st type: Message:
failed to serialize outbound message [Response{456614}{false}{false}{false}{class org.elasticsearch.search.fetch.FetchSearchResult}]
Stack:2-nd type:
Failed to execute phase [fetch], Partial shards failure; shardFailures {[W-OgW1gmQ3qFNOuZKwHtOw][profiles_reindex3][51]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-tracking-15][10.84.22.6:9300][indices:data/read/search[phase/fetch/id]]