elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.71k stars 24.67k forks source link

[CI] TransformContinuousIT testContinousEvents failing on FIPS JVM #80197

Open tlrx opened 2 years ago

tlrx commented 2 years ago

We've been asked to raise FIPS related test failure and this looks like one, a ML test failed due to a dead node because of:

» [2021-11-02T13:43:52,327][WARN ][o.e.t.TcpTransport       ] [javaRestTest-1] exception caught on transport layer [Netty4TcpChannel{localAddress=/127.0.0.1:44922, remoteAddress=/127.0.0.1:55548, profile=default}], closing connection
»  io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: org.bouncycastle.tls.TlsFatalAlert: bad_record_mac(20)
»   at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:477) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:620) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:583) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.66.Final.jar:4.1.66.Final]
»   at java.lang.Thread.run(Thread.java:831) [?:?]
»  Caused by: javax.net.ssl.SSLException: org.bouncycastle.tls.TlsFatalAlert: bad_record_mac(20)
»   at org.bouncycastle.jsse.provider.ProvSSLEngine.unwrap(ProvSSLEngine.java:489) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:679) ~[?:?]
»   at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:298) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1344) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1237) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1286) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
»   ... 16 more
»  Caused by: org.bouncycastle.tls.TlsFatalAlert: bad_record_mac(20)
»   at org.bouncycastle.tls.crypto.impl.TlsAEADCipher.decodeCiphertext(TlsAEADCipher.java:267) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.RecordStream.decodeAndVerify(RecordStream.java:232) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.RecordStream.readFullRecord(RecordStream.java:183) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.TlsProtocol.safeReadFullRecord(TlsProtocol.java:727) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.TlsProtocol.offerInput(TlsProtocol.java:1059) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.TlsProtocol.offerInput(TlsProtocol.java:1027) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.jsse.provider.ProvSSLEngine.unwrap(ProvSSLEngine.java:445) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:679) ~[?:?]
»   at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:298) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1344) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1237) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1286) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
»   ... 16 more
»  Caused by: java.lang.IllegalStateException: 
»   at org.bouncycastle.tls.crypto.impl.jcajce.Exceptions.illegalStateException(Exceptions.java:10) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.crypto.impl.jcajce.JceAEADCipherImpl.doFinal(JceAEADCipherImpl.java:146) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.crypto.impl.TlsAEADCipher.decodeCiphertext(TlsAEADCipher.java:263) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.RecordStream.decodeAndVerify(RecordStream.java:232) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.RecordStream.readFullRecord(RecordStream.java:183) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.TlsProtocol.safeReadFullRecord(TlsProtocol.java:727) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.TlsProtocol.offerInput(TlsProtocol.java:1059) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.TlsProtocol.offerInput(TlsProtocol.java:1027) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.jsse.provider.ProvSSLEngine.unwrap(ProvSSLEngine.java:445) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:679) ~[?:?]
»   at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:298) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1344) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1237) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1286) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
»   ... 16 more
»  Caused by: javax.crypto.AEADBadTagException: Error finalising cipher data: mac check in GCM failed
»   at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
»   at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:78) ~[?:?]
»   at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
»   at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
»   at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
»   at org.bouncycastle.jcajce.provider.ClassUtil.throwBadTagException(Unknown Source) ~[bc-fips-1.0.2.jar:?]
»   at org.bouncycastle.jcajce.provider.BaseCipher.engineDoFinal(Unknown Source) ~[bc-fips-1.0.2.jar:?]
»   at org.bouncycastle.jcajce.provider.BaseCipher.engineDoFinal(Unknown Source) ~[bc-fips-1.0.2.jar:?]
»   at javax.crypto.Cipher.doFinal(Cipher.java:2152) ~[?:?]
»   at org.bouncycastle.tls.crypto.impl.jcajce.JceAEADCipherImpl.doFinal(JceAEADCipherImpl.java:140) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.crypto.impl.TlsAEADCipher.decodeCiphertext(TlsAEADCipher.java:263) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.RecordStream.decodeAndVerify(RecordStream.java:232) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.RecordStream.readFullRecord(RecordStream.java:183) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.TlsProtocol.safeReadFullRecord(TlsProtocol.java:727) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.TlsProtocol.offerInput(TlsProtocol.java:1059) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.tls.TlsProtocol.offerInput(TlsProtocol.java:1027) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at org.bouncycastle.jsse.provider.ProvSSLEngine.unwrap(ProvSSLEngine.java:445) ~[bctls-fips-1.0.9.jar:1.0.9]
»   at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:679) ~[?:?]
»   at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:298) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1344) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1237) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1286) ~[netty-handler-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
»   at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
»   ... 16 more

Build scan: https://gradle-enterprise.elastic.co/s/rzr4h6ecezfke/tests/:x-pack:plugin:transform:qa:multi-node-tests:javaRestTest/org.elasticsearch.xpack.transform.integration.continuous.TransformContinuousIT/testContinousEvents

Reproduction line: ./gradlew ':x-pack:plugin:transform:qa:multi-node-tests:javaRestTest' --tests "org.elasticsearch.xpack.transform.integration.continuous.TransformContinuousIT.testContinousEvents" -Dtests.seed=D33C8F79C141C5B8 -Dtests.locale=pl -Dtests.timezone=Etc/GMT+4 -Druntime.java=16 -Dtests.fips.enabled=true

Applicable branches: 7.15

Reproduces locally?: Didn't try

Failure history: https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.transform.integration.continuous.TransformContinuousIT&tests.test=testContinousEvents

Failure excerpt:

org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=node_not_connected_exception, reason=[javaRestTest-1][127.0.0.1:44922] Node not connected]

  at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:176)
  at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2011)
  at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1988)
  at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1745)
  at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1717)
  at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1684)
  at org.elasticsearch.client.TransformClient.getTransformStats(TransformClient.java:141)
  at org.elasticsearch.xpack.transform.integration.continuous.TransformContinuousIT.getTransformStats(TransformContinuousIT.java:489)
  at org.elasticsearch.xpack.transform.integration.continuous.TransformContinuousIT.lambda$waitUntilTransformsProcessedNewData$3(TransformContinuousIT.java:503)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1027)
  at org.elasticsearch.xpack.transform.integration.continuous.TransformContinuousIT.waitUntilTransformsProcessedNewData(TransformContinuousIT.java:502)
  at org.elasticsearch.xpack.transform.integration.continuous.TransformContinuousIT.testContinousEvents(TransformContinuousIT.java:274)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:567)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
  at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
  at java.lang.Thread.run(Thread.java:831)
elasticmachine commented 2 years ago

Pinging @elastic/es-security (Team:Security)

tvernum commented 2 years ago

I don't think this is really FIPS related - it just happened on a FIPS enabled JVM.

This

»  Caused by: javax.crypto.AEADBadTagException: Error finalising cipher data: mac check in GCM failed

can occur when the read side of the socket receives a partial block of encrypted data. GCM uses something like a checksum, and if the block is only partially received, the checksum will be wrong.

That fits with this:

org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=node_not_connected_exception, reason=[javaRestTest-1][127.0.0.1:44922] Node not connected]

If a node disconnected suddenly without closing the socket cleanly, you could get a TLS GCM error.

However, that's not the end of the story. If we look at the logs for javaRestTest-1 to find out why it's not connected we see:

[2021-11-02T13:43:52,327][WARN ][o.e.t.TcpTransport       ] [javaRestTest-1] exception caught on transport layer [Netty4TcpChannel{localAddress=/127.0.0.1:44922, remoteAddress=/127.0.0.1:55548, profile=default}], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: org.bouncycastle.tls.TlsFatalAlert: bad_record_mac(20)

It looks like some sort of network issue. Possibly a bug in BC-FIPS, but there's not real reason to thing that's more likely than any other cause.

I think we're can put this one down to noise for now, but if we see it again (which might be in another test cluster suite), we might need to investigate further.

ywangd commented 2 years ago

One more in my PR CI: https://gradle-enterprise.elastic.co/s/3f3dhjt7bz5hy

It's also a FIPS one ...