bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.21k stars 4.07k forks source link

Bazel crashes with `javax.net.ssl.SSLException ... BAD_DECRYPT` #18965

Closed coryan closed 1 year ago

coryan commented 1 year ago

Description of the bug:

We use GCS as a remote cache. Our builds on macOS (and only macOS so far) fail from time to time with errors such as:

FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: Unrecoverable error while evaluating node 'ActionLookupData{actionLookupKey=ConfiguredTargetKey{label=//google/cloud/aiplatform:v1_samples_vizier_client_samples, 

File:[[<execution_root>]bazel-out/darwin-fastbuild/bin]_solib_darwin_x86_64/libexternal_Scom_Ugoogle_Uprotobuf_Ssrc_Sgoogle_Sprotobuf_Slibany_Uproto.upb.dylib, File:[[<execution_root>]bazel-out/darwin-fastbuild/bin]google/cloud/aiplatform/v1_samples_vizier_client_samples]}', ...)
    at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:642)
    at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: error:1e000065:Cipher functions:OPENSSL_internal:BAD_DECRYPT
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:480)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    ... 1 more
Caused by: javax.net.ssl.SSLException: error:1e000065:Cipher functions:OPENSSL_internal:BAD_DECRYPT
    at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.shutdownWithError(ReferenceCountedOpenSslEngine.java:1071)
    at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:1365)
    at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1305)
    at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1392)
    at io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:216)
    at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1342)
    at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1235)
    at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1284)
    at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449)
    ... 21 more

https://github.com/googleapis/google-cloud-cpp/actions/runs/5582043990/jobs/10200901653

Running the build again succeeds.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I do not have any simple repro.

Which operating system are you running Bazel on?

macOS 12.6.6 21G646

What is the output of bazel info release?

release 6.2.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

N/A

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

N/A

Have you found anything relevant by searching the web?

I found a previous report:

https://github.com/bazelbuild/bazel/issues/15142

It was closed because (I think) the original bug requested an upgrade of netty, and that was done, but the motivation (fixing this bug) remained.

I also found:

https://github.com/netty/netty/issues/11815

Any other information, logs, or outputs that you want to share?

https://github.com/googleapis/google-cloud-cpp/actions/runs/5582043990/jobs/10200901653 may be handy, though I think it will expire in 90d or so.

coeuvre commented 1 year ago

Thanks for reporting the issue. I think the error is from upstream and we cannot easily fix it. However, I do understand it's annoying to see this transient error.

I am proposing to add this error and a retriable HTTP error so that Bazel can automatically resend the network request in this case. I don't know the security implications of retrying a SSL error so I am going to wait for a few days before I am actually working on the fix. Please speak up if you don't agree.

coryan commented 1 year ago

Retrying sounds like a great idea in this case.

coryan commented 1 year ago

FWIW, it seems to affect Windows too:

https://github.com/googleapis/google-cloud-cpp/actions/runs/5696264275/job/15441037965

iancha1992 commented 1 year ago

A fix for this issue has been included in Bazel 6.4.0 RC1. Please test out the release candidate and report any issues as soon as possible. Thanks!