elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.69k stars 24.4k forks source link

CI] SearchWithMinCompatibleSearchNodeIT testMinVersionAsOldVersion failing #102575

Open afoucret opened 7 months ago

afoucret commented 7 months ago

Tests:

Three tests are failing with the same error:

CI Link

https://gradle-enterprise.elastic.co/s/foomrrnt7s2ta

Repro line

./gradlew ':qa:mixed-cluster:v8.11.2#mixedClusterTest' -Dtests.class="org.elasticsearch.backwards.SearchWithMinCompatibleSearchNodeIT" -Dtests.method="testMinVersionAsOldVersion" -Dtests.seed=5D68FEC6666DFBBE -Dtests.bwc=true -Dtests.locale=en-AU -Dtests.timezone=America/Belize -Druntime.java=21

Does it reproduce?

No

Applicable branches

main

Failure history

https://gradle-enterprise.elastic.co/scans/tests?search.timeZoneId=Europe%2FParis&tests.container=org.elasticsearch.backwards.SearchWithMinCompatibleSearchNodeIT&tests.test=testMinVersionAsOldVersion

Failure excerpt

java.lang.AssertionError: timed out waiting for green state for index [test_min_version] cluster state [{ |  
afoucret commented 7 months ago

I found some other failures for this test recently so it is likely that is is a flakiness.

Capture d’écran 2023-11-24 à 10 46 05
elasticsearchmachine commented 7 months ago

Pinging @elastic/es-search (Team:Search)

benwtrent commented 7 months ago

Looking at the logs, a node crashed, suspiciously similar to: https://github.com/elastic/elasticsearch/issues/102261

Seems like we have a weird bug here in mixed clusters and chunked serialization. 102261 indicates its Ingest stats causing the failure.

2023-11-24T09:08:19,764][WARN ][i.n.c.AbstractChannelHandlerContext] [v8.11.2-0] Failed to mark a promise as failure because it has failed already: DefaultChannelPromise@26bdc5aa(failure: java.nio.channels.ClosedChannelException), unnotified cause: java.nio.channels.ClosedChannelException
    at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.write(Netty4HttpPipeliningHandler.java:168)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:879)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:940)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
    at io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:1583)
java.lang.AssertionError: failure encoding chunk
    at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.rest.ChunkedRestResponseBody$1.encodeChunk(ChunkedRestResponseBody.java:131)
    at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.rest.RestController$EncodedLengthTrackingChunkedRestResponseBody.encodeChunk(RestController.java:838)
    at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.writeChunk(Netty4HttpPipeliningHandler.java:314)
    at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.doWrite(Netty4HttpPipeliningHandler.java:222)
    at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.doWrite(Netty4HttpPipeliningHandler.java:197)
    at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.write(Netty4HttpPipeliningHandler.java:160)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:879)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:940)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
    at io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:1583)

[2023-11-24T09:08:19,766][ERROR][o.e.ExceptionsHelper     ] [v8.11.2-0] fatal error java.lang.AssertionError: failure encoding chunk
    at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$4(ExceptionsHelper.java:257)
    at java.base/java.util.Optional.ifPresent(Optional.java:178)
    at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:247)
    at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.exceptionCaught(Netty4HttpPipeliningHandler.java:378)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:928)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:941)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
    at io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:1583)
[2023-11-24T09:08:19,767][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v8.11.2-0] fatal error in thread [elasticsearch-error-rethrower], exiting java.lang.AssertionError: failure encoding chunk
    at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.rest.ChunkedRestResponseBody$1.encodeChunk(ChunkedRestResponseBody.java:131)
    at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.rest.RestController$EncodedLengthTrackingChunkedRestResponseBody.encodeChunk(RestController.java:838)
    at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.writeChunk(Netty4HttpPipeliningHandler.java:314)
    at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.doFlush(Netty4HttpPipeliningHandler.java:296)
    at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.flush(Netty4HttpPipeliningHandler.java:260)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:923)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:941)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
    at io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:1583)

[2023-11-24T09:08:19,768][WARN ][o.e.h.AbstractHttpServerTransport] [v8.11.2-0] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/127.0.0.1:45177, remoteAddress=/127.0.0.1:33596} java.lang.Exception: java.lang.AssertionError: failure encoding chunk
    at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.exceptionCaught(Netty4HttpPipeliningHandler.java:383)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:928)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:941)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
    at io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
    at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.AssertionError: failure encoding chunk
    at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.rest.ChunkedRestResponseBody$1.encodeChunk(ChunkedRestResponseBody.java:131)
    at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.rest.RestController$EncodedLengthTrackingChunkedRestResponseBody.encodeChunk(RestController.java:838)
    at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.writeChunk(Netty4HttpPipeliningHandler.java:314)
    at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.doFlush(Netty4HttpPipeliningHandler.java:296)
    at org.elasticsearch.transport.netty4@8.12.0-SNAPSHOT/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.flush(Netty4HttpPipeliningHandler.java:260)
    at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:923)
    ... 9 more
benwtrent commented 7 months ago

This failure seems to be the same cause as this issue: https://github.com/elastic/elasticsearch/issues/102261

We need to determine why the IngestService is returning different processors for the same pipeline. This is causing the node to crash.

ldematte commented 4 months ago

Another failure in the 7.17 branch: https://gradle-enterprise.elastic.co/s/5st4xtvfascbk/tests/task/:qa:mixed-cluster:v7.1.0%23mixedClusterTest/details/org.elasticsearch.backwards.SearchWithMinCompatibleSearchNodeIT/testMinVersionAsNewVersion?top-execution=1

benwtrent commented 3 months ago

OK, that 7.17 failure is something very different:

java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0

Interesting...

benwtrent commented 3 months ago

Ah, two nodes crashed? mixed-cluster.zip

[2024-03-21T20:42:49,041][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v7.1.0-3] fatal error in thread [elasticsearch[v7.1.0-3][clusterApplierService#updateTask][T#1]], exiting
java.lang.AssertionError: {_doc=org.elasticsearch.index.mapper.DocumentMapper@79314ebf}
    at org.elasticsearch.index.mapper.MapperService.assertMappingVersion(MapperService.java:264) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.index.mapper.MapperService.updateMapping(MapperService.java:218) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.index.IndexService.updateMapping(IndexService.java:551) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.updateIndices(IndicesClusterStateService.java:533) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:266) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:478) ~[elasticsearch-7.1.0.jar:7.1.0]
    at java.lang.Iterable.forEach(Iterable.java:75) ~[?:?]
    at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:476) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:459) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:413) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:164) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) ~[elasticsearch-7.1.0.jar:7.1.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
    at java.lang.Thread.run(Thread.java:835) [?:?]
[2024-03-21T20:44:57,353][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v7.1.0-2] fatal error in thread [elasticsearch[v7.1.0-2][clusterApplierService#updateTask][T#1]], exiting
java.lang.AssertionError: {_doc=org.elasticsearch.index.mapper.DocumentMapper@2fbe67de}
    at org.elasticsearch.index.mapper.MapperService.assertMappingVersion(MapperService.java:264) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.index.mapper.MapperService.updateMapping(MapperService.java:218) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.index.IndexService.updateMapping(IndexService.java:551) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.updateIndices(IndicesClusterStateService.java:533) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:266) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:478) ~[elasticsearch-7.1.0.jar:7.1.0]
    at java.lang.Iterable.forEach(Iterable.java:75) ~[?:?]
    at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:476) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:459) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:413) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:164) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) ~[elasticsearch-7.1.0.jar:7.1.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) ~[elasticsearch-7.1.0.jar:7.1.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
    at java.lang.Thread.run(Thread.java:835) [?:?]
benwtrent commented 3 months ago

OK, here is where the assertion is tripped:

https://github.com/elastic/elasticsearch/blob/3ce7d2c9b621e796ace43008134413cea7648a06/server/src/main/java/org/elasticsearch/index/mapper/MapperService.java#L259-L264

Apparently the {_doc=org.elasticsearch.index.mapper.DocumentMapper@2fbe67de} for this node or the other is an update but the mapping version value isn't incremented?

benwtrent commented 3 months ago

Well, I can see https://github.com/elastic/elasticsearch/commit/921462a94f375cca57b48ae4f2df1650cc10ca4f

Removed this assertion check starting in 7.12, but I honestly don't understand why. I would have thought that we would still want to check that the mapping version gets updated when there are changes?

Then later we completely refactored the version checking here: https://github.com/elastic/elasticsearch/commit/3d963a3174d8a6e6e5145c7e4b025707745736cd

This was in 7.13.

I am guessing we accidentally removed assertions and then changed how we check for the differences and this then introduced some subtle bug?

@javanna Do you have any ideas? I know this was a while ago :/

benwtrent commented 3 months ago

Ah, so this test failure is likely due to a "representative change" of the object and not necessary a "equality change": https://github.com/elastic/elasticsearch/issues/59427

This tells me this assertion trip on old nodes is...useless? I honestly don't know.

This keeps getting weirder @javanna :)

benwtrent commented 3 months ago

I see this really old test failure: https://github.com/elastic/elasticsearch/issues/86887

Its roots are the same as this new failure, running into a mapping assertion failure as the representation is different, but apparently we don't increment the version values if the xcontent is now the same?

elasticsearchmachine commented 2 days ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)