Open afoucret opened 7 months ago
I found some other failures for this test recently so it is likely that is is a flakiness.
Pinging @elastic/es-search (Team:Search)
Looking at the logs, a node crashed, suspiciously similar to: https://github.com/elastic/elasticsearch/issues/102261
Seems like we have a weird bug here in mixed clusters and chunked serialization. 102261
indicates its Ingest stats causing the failure.
2023-11-24T09:08:19,764][WARN ][i.n.c.AbstractChannelHandlerContext] [v8.11.2-0] Failed to mark a promise as failure because it has failed already: DefaultChannelPromise@26bdc5aa(failure: java.nio.channels.ClosedChannelException), unnotified cause: java.nio.channels.ClosedChannelException
at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.write(Netty4HttpPipeliningHandler.java:168)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:879)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:940)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:1583)
java.lang.AssertionError: failure encoding chunk
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.rest.ChunkedRestResponseBody$1.encodeChunk(ChunkedRestResponseBody.java:131)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.rest.RestController$EncodedLengthTrackingChunkedRestResponseBody.encodeChunk(RestController.java:838)
at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.writeChunk(Netty4HttpPipeliningHandler.java:314)
at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.doWrite(Netty4HttpPipeliningHandler.java:222)
at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.doWrite(Netty4HttpPipeliningHandler.java:197)
at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.write(Netty4HttpPipeliningHandler.java:160)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:879)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:940)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:1583)
[2023-11-24T09:08:19,766][ERROR][o.e.ExceptionsHelper ] [v8.11.2-0] fatal error java.lang.AssertionError: failure encoding chunk
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$4(ExceptionsHelper.java:257)
at java.base/java.util.Optional.ifPresent(Optional.java:178)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:247)
at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.exceptionCaught(Netty4HttpPipeliningHandler.java:378)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:928)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:941)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:1583)
[2023-11-24T09:08:19,767][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v8.11.2-0] fatal error in thread [elasticsearch-error-rethrower], exiting java.lang.AssertionError: failure encoding chunk
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.rest.ChunkedRestResponseBody$1.encodeChunk(ChunkedRestResponseBody.java:131)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.rest.RestController$EncodedLengthTrackingChunkedRestResponseBody.encodeChunk(RestController.java:838)
at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.writeChunk(Netty4HttpPipeliningHandler.java:314)
at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.doFlush(Netty4HttpPipeliningHandler.java:296)
at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.flush(Netty4HttpPipeliningHandler.java:260)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:923)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:941)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:1583)
[2023-11-24T09:08:19,768][WARN ][o.e.h.AbstractHttpServerTransport] [v8.11.2-0] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/127.0.0.1:45177, remoteAddress=/127.0.0.1:33596} java.lang.Exception: java.lang.AssertionError: failure encoding chunk
at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.exceptionCaught(Netty4HttpPipeliningHandler.java:383)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:928)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:941)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.AssertionError: failure encoding chunk
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.rest.ChunkedRestResponseBody$1.encodeChunk(ChunkedRestResponseBody.java:131)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.rest.RestController$EncodedLengthTrackingChunkedRestResponseBody.encodeChunk(RestController.java:838)
at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.writeChunk(Netty4HttpPipeliningHandler.java:314)
at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.doFlush(Netty4HttpPipeliningHandler.java:296)
at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.flush(Netty4HttpPipeliningHandler.java:260)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:923)
... 9 more
This failure seems to be the same cause as this issue: https://github.com/elastic/elasticsearch/issues/102261
We need to determine why the IngestService is returning different processors for the same pipeline. This is causing the node to crash.
OK, that 7.17 failure is something very different:
java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
Interesting...
Ah, two nodes crashed? mixed-cluster.zip
[2024-03-21T20:42:49,041][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v7.1.0-3] fatal error in thread [elasticsearch[v7.1.0-3][clusterApplierService#updateTask][T#1]], exiting
java.lang.AssertionError: {_doc=org.elasticsearch.index.mapper.DocumentMapper@79314ebf}
at org.elasticsearch.index.mapper.MapperService.assertMappingVersion(MapperService.java:264) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.index.mapper.MapperService.updateMapping(MapperService.java:218) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.index.IndexService.updateMapping(IndexService.java:551) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.updateIndices(IndicesClusterStateService.java:533) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:266) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:478) ~[elasticsearch-7.1.0.jar:7.1.0]
at java.lang.Iterable.forEach(Iterable.java:75) ~[?:?]
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:476) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:459) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:413) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:164) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) ~[elasticsearch-7.1.0.jar:7.1.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
[2024-03-21T20:44:57,353][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v7.1.0-2] fatal error in thread [elasticsearch[v7.1.0-2][clusterApplierService#updateTask][T#1]], exiting
java.lang.AssertionError: {_doc=org.elasticsearch.index.mapper.DocumentMapper@2fbe67de}
at org.elasticsearch.index.mapper.MapperService.assertMappingVersion(MapperService.java:264) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.index.mapper.MapperService.updateMapping(MapperService.java:218) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.index.IndexService.updateMapping(IndexService.java:551) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.updateIndices(IndicesClusterStateService.java:533) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:266) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:478) ~[elasticsearch-7.1.0.jar:7.1.0]
at java.lang.Iterable.forEach(Iterable.java:75) ~[?:?]
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:476) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:459) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:413) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:164) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) ~[elasticsearch-7.1.0.jar:7.1.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
OK, here is where the assertion is tripped:
Apparently the {_doc=org.elasticsearch.index.mapper.DocumentMapper@2fbe67de}
for this node or the other is an update but the mapping version value isn't incremented?
Well, I can see https://github.com/elastic/elasticsearch/commit/921462a94f375cca57b48ae4f2df1650cc10ca4f
Removed this assertion check starting in 7.12, but I honestly don't understand why. I would have thought that we would still want to check that the mapping version gets updated when there are changes?
Then later we completely refactored the version checking here: https://github.com/elastic/elasticsearch/commit/3d963a3174d8a6e6e5145c7e4b025707745736cd
This was in 7.13.
I am guessing we accidentally removed assertions and then changed how we check for the differences and this then introduced some subtle bug?
@javanna Do you have any ideas? I know this was a while ago :/
Ah, so this test failure is likely due to a "representative change" of the object and not necessary a "equality change": https://github.com/elastic/elasticsearch/issues/59427
This tells me this assertion trip on old nodes is...useless? I honestly don't know.
This keeps getting weirder @javanna :)
I see this really old test failure: https://github.com/elastic/elasticsearch/issues/86887
Its roots are the same as this new failure, running into a mapping assertion failure as the representation is different, but apparently we don't increment the version values if the xcontent is now the same?
Pinging @elastic/es-search-foundations (Team:Search Foundations)
Tests:
Three tests are failing with the same error:
CI Link
https://gradle-enterprise.elastic.co/s/foomrrnt7s2ta
Repro line
Does it reproduce?
No
Applicable branches
main
Failure history
https://gradle-enterprise.elastic.co/scans/tests?search.timeZoneId=Europe%2FParis&tests.container=org.elasticsearch.backwards.SearchWithMinCompatibleSearchNodeIT&tests.test=testMinVersionAsOldVersion
Failure excerpt