Closed srago closed 2 years ago
@bazel-io flag
Sorry about that. This corner of Bazel is poorly tested; I guess something like this was bound to happen.
When enabling verbose failures, the stack trace pointed to a failed precondition because the host portion of the URI was null
Could you please share the full stack trace and the exact values of --remote_executor
and --remote_cache
you're using?
Here's the command line and stack trace. The line numbers might be off a bit -- I recompiled bazel from the head and added a couple printf statements.
$ bazel build --remote_cache=unix:///home/coder/.cache/fax/3a162c244f546dff591bc0d335e9c174/jail.sock --remote_executor=unix:///home/coder/.cache/fax/3a162c244f546dff591bc0d335e9c174/jail.sock :all --verbose_failures
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Invocation ID: 55dd5c7b-43ea-4a6f-a6cf-0c05ce656dbe
ERROR: java.io.IOException: io.grpc.StatusRuntimeException: UNAUTHENTICATED: Failed computing credential metadata
at com.google.devtools.build.lib.remote.RemoteServerCapabilities.get(RemoteServerCapabilities.java:90)
at com.google.devtools.build.lib.remote.RemoteModule.verifyServerCapabilities(RemoteModule.java:196)
at com.google.devtools.build.lib.remote.RemoteModule.beforeCommand(RemoteModule.java:492)
at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:378)
at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:232)
at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:550)
at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:614)
at io.grpc.Context$1.run(Context.java:566)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.grpc.StatusRuntimeException: UNAUTHENTICATED: Failed computing credential metadata
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
at build.bazel.remote.execution.v2.CapabilitiesGrpc$CapabilitiesBlockingStub.getCapabilities(CapabilitiesGrpc.java:215)
at com.google.devtools.build.lib.remote.RemoteServerCapabilities.lambda$get$0(RemoteServerCapabilities.java:85)
at com.google.devtools.build.lib.remote.ReferenceCountedChannel.lambda$withChannelBlocking$2(ReferenceCountedChannel.java:85)
at com.google.devtools.build.lib.remote.ReferenceCountedChannel.lambda$withChannel$4(ReferenceCountedChannel.java:108)
at io.reactivex.rxjava3.internal.operators.single.SingleUsing.subscribeActual(SingleUsing.java:59)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleFlatMap$SingleFlatMapCallback.onSuccess(SingleFlatMap.java:85)
at io.reactivex.rxjava3.internal.operators.single.SingleFlatMap$SingleFlatMapCallback$FlatMapSingleObserver.onSuccess(SingleFlatMap.java:112)
at io.reactivex.rxjava3.internal.operators.single.SingleMap$MapSingleObserver.onSuccess(SingleMap.java:65)
at io.reactivex.rxjava3.internal.operators.single.SingleDoOnDispose$DoOnDisposeObserver.onSuccess(SingleDoOnDispose.java:84)
at io.reactivex.rxjava3.internal.operators.single.SingleDoOnError$DoOnError.onSuccess(SingleDoOnError.java:52)
at io.reactivex.rxjava3.internal.operators.observable.ObservableSingleSingle$SingleElementObserver.onComplete(ObservableSingleSingle.java:110)
at io.reactivex.rxjava3.internal.observers.DeferredScalarDisposable.complete(DeferredScalarDisposable.java:85)
at io.reactivex.rxjava3.subjects.AsyncSubject.subscribeActual(AsyncSubject.java:233)
at io.reactivex.rxjava3.core.Observable.subscribe(Observable.java:13176)
at io.reactivex.rxjava3.internal.operators.observable.ObservableSingleSingle.subscribeActual(ObservableSingleSingle.java:36)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleDoOnError.subscribeActual(SingleDoOnError.java:35)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleDoOnDispose.subscribeActual(SingleDoOnDispose.java:38)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleMap.subscribeActual(SingleMap.java:35)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleFlatMap$SingleFlatMapCallback.onSuccess(SingleFlatMap.java:85)
at io.reactivex.rxjava3.internal.operators.single.SingleCreate$Emitter.onSuccess(SingleCreate.java:68)
at com.google.devtools.build.lib.remote.grpc.TokenBucket$1.onNext(TokenBucket.java:79)
at io.reactivex.rxjava3.internal.util.NotificationLite.accept(NotificationLite.java:247)
at io.reactivex.rxjava3.subjects.BehaviorSubject$BehaviorDisposable.test(BehaviorSubject.java:507)
at io.reactivex.rxjava3.subjects.BehaviorSubject$BehaviorDisposable.emitFirst(BehaviorSubject.java:468)
at io.reactivex.rxjava3.subjects.BehaviorSubject.subscribeActual(BehaviorSubject.java:224)
at io.reactivex.rxjava3.core.Observable.subscribe(Observable.java:13176)
at com.google.devtools.build.lib.remote.grpc.TokenBucket.lambda$acquireToken$0(TokenBucket.java:64)
at io.reactivex.rxjava3.internal.operators.single.SingleCreate.subscribeActual(SingleCreate.java:40)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleFlatMap.subscribeActual(SingleFlatMap.java:37)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleDefer.subscribeActual(SingleDefer.java:43)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleFlatMap.subscribeActual(SingleFlatMap.java:37)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.core.Single.blockingGet(Single.java:3644)
at com.google.devtools.build.lib.remote.ReferenceCountedChannel.withChannelBlocking(ReferenceCountedChannel.java:85)
at com.google.devtools.build.lib.remote.RemoteServerCapabilities.lambda$get$1(RemoteServerCapabilities.java:84)
at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:244)
at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:125)
at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:114)
at com.google.devtools.build.lib.remote.RemoteServerCapabilities.get(RemoteServerCapabilities.java:82)
... 10 more
Caused by: java.lang.NullPointerException
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:889)
at com.google.devtools.build.lib.authandtls.credentialhelper.CredentialHelperProvider.findCredentialHelper(CredentialHelperProvider.java:74)
at com.google.devtools.build.lib.authandtls.credentialhelper.CredentialHelperCredentials$CredentialHelperCacheLoader.load(CredentialHelperCredentials.java:132)
at com.google.devtools.build.lib.authandtls.credentialhelper.CredentialHelperCredentials$CredentialHelperCacheLoader.load(CredentialHelperCredentials.java:114)
at com.github.benmanes.caffeine.cache.LocalLoadingCache.lambda$newMappingFunction$2(LocalLoadingCache.java:141)
at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2413)
at java.base/java.util.concurrent.ConcurrentHashMap.compute(Unknown Source)
at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2411)
at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2394)
at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:54)
at com.google.devtools.build.lib.authandtls.credentialhelper.CredentialHelperCredentials.getRequestMetadataFromCredentialHelper(CredentialHelperCredentials.java:90)
at com.google.devtools.build.lib.authandtls.credentialhelper.CredentialHelperCredentials.getRequestMetadata(CredentialHelperCredentials.java:74)
at com.google.auth.Credentials.blockingGetToCallback(Credentials.java:112)
at com.google.auth.Credentials$1.run(Credentials.java:98)
at io.grpc.stub.ClientCalls$ThreadlessExecutor.waitAndDrain(ClientCalls.java:740)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:149)
... 57 more
ERROR: Failed to query remote execution capabilities: UNAUTHENTICATED: Failed computing credential metadata
Thanks, this is helpful. It appears that new URI("unix:///foo/bar).getHost() == null
and new URI("unix:/foo/bar").getHost() == null
, but new URI("unix://foo/bar").getHost() == "foo"
. I wonder if using one or two slashes after unix:
instead of three is a valid workaround. Hopefully the gRPC library isn't too picky...
That being said, I totally agree that we shouldn't crash here! I think the right fix is to replace the precondition check in findCredentialHelper
with an early exit if the host part is empty/null. That way we don't have to make any assumptions about URI schemes. I'll put a fix together on Monday.
Regardless of the number of slashes, it used to work with versions of bazel older than 5.3. I've used it on 5.2.0, 5.1.0, 5.0.0, and 4.2.2 to name a few. But I'll look into seeing how it behaves when we have only 1 slash and let you know what I find.
Yep, regardless of the number of slashes, it fails at the same precondition check. (I tried 1, 2, and 3.) Also, for unix domain sockets, there is no host -- the entire remainder of the string is the pathname of the socket file, so getHost() probably should return null when the scheme is "unix:".
@tjgq do you think this warrants a patch release?
@Wyverald It's a regression that affects existing users and there doesn't seem to be a workaround for it, so probably yes? (I don't know what's our patch release policy.)
@srago I'd like to go with #16184 as the fix. Could you please confirm that it works for you?
cc @Yannic - until #16185 is fixed, we should manually verify that a UDS proxy works when making changes to credential helpers.
@bazel-io fork 5.3.1
Confirmed #16184 fixes the problem with Unix domain sockets. Thanks for the quick response time!
We use a unix domain socket for our remote_executor and remote_cache endpoints, because we use a side-car proxy to handle SPIFFE mTLS negotiation. This stopped working with the release of bazel 5.3.0. With 5.3.0, bazel fails before issuing its first GetCapabilitiesRequest. The error message was "ERROR: Failed to query remote execution capabilities: UNAUTHENTICATED: Failed computing credential metadata." When enabling verbose failures, the stack trace pointed to a failed precondition because the host portion of the URI was null. I played around with the code and got it working again, but the fix isn't production-quality -- it doesn't handle the case where the remote cache endpoint differs from the remote executor endpoint, and might need different credentials. Because we're using unix domain sockets, this shouldn't affect us at all. Anyway, consider the following patch a place to start the conversation about what the correct fix might look like.