bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.77k stars 3.99k forks source link

unable to finalize action: Missing digest: <hash>/<len> for ...jdeps #22854

Open rbeasley-avgo opened 1 month ago

rbeasley-avgo commented 1 month ago

Description of the bug:

Since upgrading to Bazel 7, we've encountered numerous sporadic build failures. Most are covered by other GitHub issues, but AFAICT nobody's filed one about .jdeps files.

I am going to experiment with --noexperimental_inmemory_jdeps_files.

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Unknown.

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

release 7.2.0-vmware

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

This is just Bazel 7.2.0 with a handful of patches for PRs that are either outstanding or have been rejected. None are related to scheduling, remote caching, etc.

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

Any other information, logs, or outputs that you want to share?

Our RBE implementation is Buildfarm.

We're using the following options:

# RBE-related flags
--remote_download_outputs=all
--internal_spawn_scheduler
--spawn_strategy=dynamic
--dynamic_local_strategy=worker,sandboxed,local
--remote_retries=5
--experimental_remote_cache_eviction_retries=5
--verbose_failures
--remote_cache=
--disk_cache=
--noremote_upload_local_results
--experimental_remote_cache_async
--experimental_remote_merkle_tree_cache
--remote_local_fallback
--remote_local_fallback_strategy=sandboxed
--experimental_remote_downloader_local_fallback
--remote_cache_compression

# Workaround for https://github.com/bazelbuild/bazel/issues/22387 .
build --noexperimental_inmemory_dotd_files

In failing builds w/ this syndrome, java.log contains backtraces resembling the following

com.google.devtools.build.lib.remote.common.BulkTransferException: Missing digest: HASH/LEN for LABEL.jdeps
        at com.google.devtools.build.lib.remote.util.Utils.lambda$mergeBulkTransfer$4(Utils.java:656)
        at com.google.common.util.concurrent.CombinedFuture$AsyncCallableInterruptibleTask.runInterruptibly(CombinedFuture.java:165)
        at com.google.common.util.concurrent.CombinedFuture$AsyncCallableInterruptibleTask.runInterruptibly(CombinedFuture.java:153)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
        at com.google.common.util.concurrent.CombinedFuture$CombinedFutureInterruptibleTask.execute(CombinedFuture.java:108)
        at com.google.common.util.concurrent.CombinedFuture.handleAllCompleted(CombinedFuture.java:65)
        at com.google.common.util.concurrent.AggregateFuture.processCompleted(AggregateFuture.java:301)
        at com.google.common.util.concurrent.AggregateFuture.decrementCountAndMaybeComplete(AggregateFuture.java:283)
        at com.google.common.util.concurrent.AggregateFuture.lambda$init$1(AggregateFuture.java:181)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1286)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1055)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:807)
        at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:55)
        at com.google.devtools.build.lib.remote.util.RxFutures$1.onError(RxFutures.java:221)
        at io.reactivex.rxjava3.internal.operators.completable.CompletableFromSingle$CompletableFromSingleObserver.onError(CompletableFromSingle.java:41)
        at io.reactivex.rxjava3.internal.operators.single.SingleCreate$Emitter.tryOnError(SingleCreate.java:95)
        at io.reactivex.rxjava3.internal.operators.single.SingleCreate$Emitter.onError(SingleCreate.java:81)
        at com.google.devtools.build.lib.remote.util.AsyncTaskCache$1.onError(AsyncTaskCache.java:339)
        at com.google.devtools.build.lib.remote.util.AsyncTaskCache$Execution.onError(AsyncTaskCache.java:205)
        at io.reactivex.rxjava3.internal.operators.completable.CompletableToSingle$ToSingle.onError(CompletableToSingle.java:73)
        at io.reactivex.rxjava3.internal.operators.completable.CompletableUsing$UsingObserver.onError(CompletableUsing.java:165)
        at io.reactivex.rxjava3.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onError(CompletablePeek.java:95)
        at io.reactivex.rxjava3.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onError(CompletablePeek.java:95)
        at io.reactivex.rxjava3.internal.operators.completable.CompletableCreate$Emitter.tryOnError(CompletableCreate.java:91)
        at io.reactivex.rxjava3.internal.operators.completable.CompletableCreate$Emitter.onError(CompletableCreate.java:77)
        at com.google.devtools.build.lib.remote.util.RxFutures$OnceCompletableOnSubscribe$1.onFailure(RxFutures.java:102)
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1119)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1286)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1055)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:807)
        at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:55)
        at com.google.devtools.build.lib.remote.RemoteCache$3.onFailure(RemoteCache.java:381)
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1119)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1286)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1055)
        at com.google.common.util.concurrent.AbstractFuture.setFuture(AbstractFuture.java:850)
        at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:125)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)                                                                                                                                                          at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1286)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1055)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:807)
        at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:105)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1286)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1055)
        at com.google.common.util.concurrent.AbstractFuture.setFuture(AbstractFuture.java:850)
        at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:216)
        at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:192)
        at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:144)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1286)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1055)
        at com.google.common.util.concurrent.AbstractFuture.setFuture(AbstractFuture.java:850)
        at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:216)
        at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:192)
        at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:144)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1286)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1055)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:807)
        at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:105)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1286)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1055)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:807)
        at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:55)
        at com.google.devtools.build.lib.remote.util.RxFutures$2.onError(RxFutures.java:259)
        at io.reactivex.rxjava3.internal.operators.single.SingleFlatMap$SingleFlatMapCallback$FlatMapSingleObserver.onError(SingleFlatMap.java:117)
        at io.reactivex.rxjava3.internal.operators.single.SingleUsing$UsingSingleObserver.onError(SingleUsing.java:180)
        at io.reactivex.rxjava3.internal.operators.single.SingleCreate$Emitter.tryOnError(SingleCreate.java:95)
        at io.reactivex.rxjava3.internal.operators.single.SingleCreate$Emitter.onError(SingleCreate.java:81)
        at com.google.devtools.build.lib.remote.util.RxFutures$OnceSingleOnSubscribe$1.onFailure(RxFutures.java:172)
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1119)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1286)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1055)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:807)
        at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:55)
        at com.google.devtools.build.lib.remote.GrpcCacheClient$1.onError(GrpcCacheClient.java:453)
        at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:487)
        at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
        at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
        at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
        at com.google.devtools.build.lib.remote.NetworkTimeInterceptor$NetworkTimeCall$1.onClose(NetworkTimeInterceptor.java:81)
        at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
        at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
        at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
        at com.google.devtools.build.lib.remote.logging.LoggingInterceptor$LoggingForwardingCall$1.onClose(LoggingInterceptor.java:157)
        at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562)
        at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
meisterT commented 3 weeks ago

If possible, please provide a repro since it is unclear to us how it could happen and a repro would help making progress.

rbeasley-avgo commented 3 weeks ago

If possible, please provide a repro since it is unclear to us how it could happen and a repro would help making progress.

Believe me, I'm trying. :) In the meantime, are there any other artifacts that could help w/ post-mortem debugging (e.g. --remote_grpc_log, java.log)? I'm happy to configure our builds to collect more information, sanitize it, and share here. If not, no worries; I'll do what I can to repeat the failure and home in on a repro case.

tjgq commented 3 weeks ago

I suspect this might be the same as https://github.com/bazelbuild/bazel/issues/22387 because .d and .jdeps files are handled similarly by Bazel. Does setting --noexperimental_inmemory_jdeps_files make the issue go away?

The --experimental_remote_grpc_log would be useful. Feel free to sanitize file names, input arguments, etc, but please leave the digests intact (or rewrite them in a consistent manner) so they can be correlated across requests.

rbeasley-avgo commented 3 weeks ago

I suspect this might be the same as #22387 because .d and .jdeps files are handled similarly by Bazel. Does setting --noexperimental_inmemory_jdeps_files make the issue go away?

Yes, this goes away when using --noexperimental_inmemory_jdeps_files. We've had no such failures since adopting that flag.

tjgq commented 1 day ago

@rbeasley-avgo Can you provide either a repro, or an --experimental_remote_grpc_log for a build exhibiting this failure? Otherwise it's going to be difficult to make progress on this.

rbeasley-avgo commented 1 day ago

@rbeasley-avgo Can you provide either a repro, or an --experimental_remote_grpc_log for a build exhibiting this failure? Otherwise it's going to be difficult to make progress on this.

Apologies for the radio silence. Was on PTO.

I haven't been able to generate a repro, so instead I'm just waiting for the west coast to wake up to review a change that removes the --noexperimental_inmemory_foo flags from our builds. Once I have a few failures, I'll collect the gRPC logs, convert to plaintext (using bazelbuild/tools_remote), sanitize, and share with you.