elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.18k stars 24.84k forks source link

[CI] MLModelDeploymentsUpgradeIT testTrainedModelDeployment failing #92153

Closed original-brownbear closed 1 year ago

original-brownbear commented 1 year ago

This seem to be a serialization bug:

» [2022-12-06T12:32:02,718][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v8.6.0-2] fatal error in thread [Thread-10], exiting java.lang.AssertionError: java.lang.IllegalStateException: Message not fully read (request) for requestId [186], action [cluster:monitor/xpack/ml/trained_models/deployment/infer[n]], available [0]; resetting
»   at org.elasticsearch.server@8.6.0-SNAPSHOT/org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:271)
»   at org.elasticsearch.server@8.6.0-SNAPSHOT/org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:116)
»   at org.elasticsearch.server@8.6.0-SNAPSHOT/org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:95)
»   at org.elasticsearch.server@8.6.0-SNAPSHOT/org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:808)
»   at org.elasticsearch.server@8.6.0-SNAPSHOT/org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:149)
»   at org.elasticsearch.server@8.6.0-SNAPSHOT/org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:121)
»   at org.elasticsearch.server@8.6.0-SNAPSHOT/org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:86)
»   at org.elasticsearch.transport.netty4@8.6.0-SNAPSHOT/org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.channelRead(Netty4MessageInboundHandler.java:63)
»   at io.netty.transport@4.1.84.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
»   at io.netty.transport@4.1.84.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»   at io.netty.transport@4.1.84.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
»   at io.netty.handler@4.1.84.Final/io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280)
»   at io.netty.transport@4.1.84.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
»   at io.netty.transport@4.1.84.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»   at io.netty.transport@4.1.84.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
»   at io.netty.codec@4.1.84.Final/io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
»   at io.netty.transport@4.1.84.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
»   at io.netty.transport@4.1.84.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»   at io.netty.transport@4.1.84.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
»   at io.netty.handler@4.1.84.Final/io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1373)
»   at io.netty.handler@4.1.84.Final/io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1236)
»   at io.netty.handler@4.1.84.Final/io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1285)
»   at io.netty.codec@4.1.84.Final/io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:519)
»   at io.netty.codec@4.1.84.Final/io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:458)
»   at io.netty.codec@4.1.84.Final/io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:280)
»   at io.netty.transport@4.1.84.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
»   at io.netty.transport@4.1.84.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»   at io.netty.transport@4.1.84.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)

Build scan: https://gradle-enterprise.elastic.co/s/m5o3yp5vknx4g/tests/:x-pack:qa:rolling-upgrade:v8.6.0%23oneThirdUpgradedTest/org.elasticsearch.upgrades.MLModelDeploymentsUpgradeIT/testTrainedModelDeployment

Reproduction line:

./gradlew ':x-pack:qa:rolling-upgrade:v8.6.0#oneThirdUpgradedTest' -Dtests.class="org.elasticsearch.upgrades.MLModelDeploymentsUpgradeIT" -Dtests.method="testTrainedModelDeployment" -Dtests.seed=AAE1F74104DC8F5F -Dtests.bwc=true -Dtests.locale=el-CY -Dtests.timezone=Pacific/Guam -Druntime.java=17

Applicable branches: main

Reproduces locally?: Didn't try

Failure history: https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.upgrades.MLModelDeploymentsUpgradeIT&tests.test=testTrainedModelDeployment

Failure excerpt:

org.elasticsearch.client.ResponseException: method [POST], host [http://127.0.0.1:33908], URI [/_ml/trained_models/upgrade-deployment-test/deployment/_infer], status line [HTTP/1.1 500 Internal Server Error]
Warnings: [[POST /_ml/trained_models/{model_id}/deployment/_infer] is deprecated! Use [POST /_ml/trained_models/{model_id}/_infer] instead.]
{"error":{"root_cause":[{"type":"node_disconnected_exception","reason":"[v8.6.0-2][127.0.0.1:34396][cluster:monitor/xpack/ml/trained_models/deployment/infer[n]] disconnected"}],"type":"failed_node_exception","reason":"Failed node [Kqcr2Up2RQmWKkflha-kxA]","node_id":"Kqcr2Up2RQmWKkflha-kxA","caused_by":{"type":"node_disconnected_exception","reason":"[v8.6.0-2][127.0.0.1:34396][cluster:monitor/xpack/ml/trained_models/deployment/infer[n]] disconnected"}},"status":500}

  at __randomizedtesting.SeedInfo.seed([AAE1F74104DC8F5F:5139B11E251C35C8]:0)
  at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:347)
  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:313)
  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288)
  at org.elasticsearch.upgrades.MLModelDeploymentsUpgradeIT.infer(MLModelDeploymentsUpgradeIT.java:283)
  at org.elasticsearch.upgrades.MLModelDeploymentsUpgradeIT.assertInfer(MLModelDeploymentsUpgradeIT.java:183)
  at org.elasticsearch.upgrades.MLModelDeploymentsUpgradeIT.testTrainedModelDeployment(MLModelDeploymentsUpgradeIT.java:116)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:568)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:833)
elasticsearchmachine commented 1 year ago

Pinging @elastic/ml-core (Team:ML)

davidkyle commented 1 year ago

Thanks for raising this @original-brownbear I've opened a fix