bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.2k stars 4.06k forks source link

cc_test coverage fails when fetching from remote cache #20556

Closed anhlinh123 closed 9 months ago

anhlinh123 commented 10 months ago

Description of the bug:

Running coverage on a cc_test randomly fails with the following error: I/O exception during sandboxed execution: Input is a directory: bazel-out/k8-fastbuild/testlogs/<path_to_the_test_dir>/_coverage Afaik, the condition under which the bug happens might be:

Which category does this issue belong to?

Remote Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Which operating system are you running Bazel on?

Ubuntu 20.04.6 LTS

What is the output of bazel info release?

release 7.0.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

Probably it happens after this commit 1267631fee7523ea3ae572c9b8c9c1fc9c8c9452, when it tries to fetch metadata of a TreeArtifact (the _coverage directory).

Have you found anything relevant by searching the web?

No.

Any other information, logs, or outputs that you want to share?

No response

sputt commented 10 months ago

Good writeup thank you. This is blocking our upgrade to Bazel 7.0.0. It also occurs with these flags set:

coverage --experimental_fetch_all_coverage_outputs
coverage --experimental_split_coverage_postprocessing

which we set due to requiring nobuild_runfile_links.

tjgq commented 10 months ago

@anhlinh123 @sputt If you run with --verbose_failures, are you able to get the stack trace leading up to the I/O exception during sandboxed execution exception? That would help narrow down the issue.

nlou9 commented 10 months ago

we ran into this issue with py_test too. During migration to bazel 7.0.0, when we try to run coverage with py_test, we have this error I/O exception during sandboxed execution: Input is a directory: bazel-out/aarch64-fastbuild/testlogs/tests/ci/release_stable/release_stable_lint_flake8_test/_coverage

our coverage options:

test --experimental_fetch_all_coverage_outputs
test --remote_download_minimal
test --experimental_split_coverage_postprocessing
coverage --combined_report=lcov
fmeum commented 10 months ago

@bazel-io flag

fmeum commented 10 months ago

@iancha1992 This looks like another 7.0.0 regression, just want to make sure it's tracked.

iancha1992 commented 10 months ago

@fmeum @Wyverald @meteorcloudy Should this be included in the 7.0.1 patch? or is it okay to go straight to 7.1.0?

cc: @bazelbuild/triage

Wyverald commented 10 months ago

Since this is a regression in 7.0.0, ideally we should include a fix in 7.0.1. But that depends on the timeline. @tjgq do you have an estimate for how long the fix might take?

iancha1992 commented 9 months ago

@bazel-io fork 7.0.1

iancha1992 commented 9 months ago

@bazel-io fork 7.1.0

tjgq commented 9 months ago

@nlou9 Same request as above: can you please run with --verbose_failures and post the full stack trace here?

lberki commented 9 months ago

Are you setting --experimental_split_coverage_postprocessing by any chance?

If so, I suspect that the root cause is the same as the coverage failure in #20753 and if so, have a fix for it that is in the process of being submitted (although CI is not really a happy camper now so it will take a bit of time to materialize at HEAD)

nlou9 commented 9 months ago

@lberki we are using --experimental_split_coverage_postprocessing as I mentioned above

nlou9 commented 9 months ago

@tjgq, here is the log

bazel coverage //...Failed in 
02:17
2024/01/09 02:29:07 Downloading https://releases.bazel.build/7.0.0/release/bazel-7.0.0-linux-arm64...00:01
2024/01/09 02:29:07 Skipping basic authentication for releases.bazel.build because no credentials found in /home/semaphore/.netrc00:01
Extracting Bazel installation...00:01
Starting local Bazel server and connecting to it...00:07
(02:29:13) INFO: Invocation ID: d0d3983c-0fcd-4822-974a-8cda4a8476d500:08
(02:29:13) INFO: Options provided by the client:00:08
 00:08
(02:29:13) INFO: Reading rc options for 'coverage' from /home/semaphore/.bazelrc:00:08
  Inherited 'common' options:
(02:29:13) INFO: Reading rc options for 'coverage' from /home/semaphore/ci-tools/.bazelrc:00:08
  Inherited 'build' options: --remote_retries=2 --remote_timeout=7200 --java_runtime_version=remotejdk_11 --enable_bzlmod=false00:08
(02:29:13) INFO: Reading rc options for 'coverage' from /home/semaphore/.bazelrc:00:08
  Inherited 'build' options: --announce_rc --show_timestamps --show_progress_rate_limit=60 --curses=no --remote_download_minimal00:08
(02:29:13) INFO: Reading rc options for 'coverage' from /home/semaphore/ci-tools/.bazelrc:00:08
  Inherited 'test' options: --test_output=errors --test_timeout=-1,-1,-1,720000:08
(02:29:13) INFO: Reading rc options for 'coverage' from /home/semaphore/.bazelrc:00:08
  Inherited 'test' options: --test_output=errors --experimental_fetch_all_coverage_outputs --experimental_split_coverage_postprocessing00:08
(02:29:13) INFO: Reading rc options for 'coverage' from /home/semaphore/ci-tools/.bazelrc:00:08
  'coverage' options: --combined_report=lcov --verbose_failures --instrumentation_filter=^// --instrumentation_filter=^//confluent[/:],-.*(test|tests|lint)00:08
(02:29:13) INFO: Current date is 2024-01-0900:08
(02:29:13) Computing main repo mapping: 00:08
(02:29:18) DEBUG: /home/semaphore/.cache/bazel/_bazel_semaphore/c2e391e2c21d1440479c3b26017f505f/external/rules_python/python/pip.bzl:47:10: pip_install is deprecated. Please switch to pip_parse. pip_install will be removed in a future release.00:13
(02:29:19) Loading: 00:13
(02:29:19) Loading: 0 packages loaded00:13
(02:29:20) Analyzing: 324 targets (26 packages loaded, 0 targets configured)00:14
(02:29:20) Analyzing: 324 targets (26 packages loaded, 0 targets configured)00:14
[0 / 1] checking cached actions00:14
(02:30:21) Analyzing: 324 targets (199 packages loaded, 10070 targets configured)01:16
[1 / 2] checking cached actions01:16
(02:30:32) INFO: Analyzed 324 targets (219 packages loaded, 11036 targets configured).01:26
(02:31:21) [622 / 741] Creating runfiles tree bazel-out/aarch64-fastbuild/bin/tests/ci/release_stable/release/test_builder_first_rc_from_repo.runfiles; 45s local ... (112 actions, 93 running)02:16
(02:31:23) ERROR: /home/semaphore/ci-tools/tests/ci/scripts/BUILD.bazel:44:8: Testing //tests/ci/scripts:test_update_version_integration_lint_flake8_test failed: I/O exception during sandboxed execution: Input is a directory: bazel-out/aarch64-fastbuild/testlogs/tests/ci/scripts/test_update_version_integration_lint_flake8_test/_coverage02:17
(02:31:23) INFO: Elapsed time: 135.863s, Critical Path: 47.72s
lberki commented 9 months ago

Then I believe https://github.com/iancha1992/bazel/commit/9463a4f50981785da493097e1197799c525b6278 is the fix (it'll be both in 7.0.1 and 7.1.0)

nlou9 commented 9 months ago

@lberki wonder the ETA of 7.0.1 or 7.1.0?

lberki commented 9 months ago

I think the idea is that 7.0.1 should be out sometime next week the latest (don't take this as a promise until @meteorcloudy confirms)

lberki commented 9 months ago

Also, @tjgq suspects that there is another bug lurking in the deep that could cause your build to fail like this and he was working on confirming or denying that. My guess is that the above change is enough, but that's just a guess and an uninformed one.

tjgq commented 9 months ago

If we have confirmation that the repro requires --experimental_split_coverage_postprocessing, I believe https://github.com/bazelbuild/bazel/commit/b0db044227d62178a7e578b8e03c452d8c17af33 will fix it. (Otherwise, if it's supposed to repro without --experimental_split_coverage_postprocessing, I couldn't do so following the instructions above.)

lberki commented 9 months ago

Thanks, your mention of AbstractActionInputPrefetcher scared me, but if that's not involved in this breakage, I'm relieved .

UebelAndre commented 9 months ago

I see this on python tests as well. This issue blocks my ability to upgrade Bazel.

anhlinh123 commented 9 months ago

@tjgq Sorry for the late reply. This is the error message when using --verbose_failures:

ERROR: /<path_to_the_test>/BUILD:50:8: Testing //<path>:UnitTest failed: I/O exception during sandboxed execution: 11 errors during bulk transfer:
com.google.devtools.build.lib.actions.DigestOfDirectoryException: Input is a directory: bazel-out/k8-fastbuild/testlogs/<path>/UnitTest/_coverage
com.google.devtools.build.lib.actions.DigestOfDirectoryException: Input is a directory: bazel-out/k8-fastbuild/testlogs/<path>/UnitTest/_coverage
com.google.devtools.build.lib.actions.DigestOfDirectoryException: Input is a directory: bazel-out/k8-fastbuild/testlogs/<path>/UnitTest/_coverage
com.google.devtools.build.lib.actions.DigestOfDirectoryException: Input is a directory: bazel-out/k8-fastbuild/testlogs/<path>/UnitTest/_coverage
com.google.devtools.build.lib.actions.DigestOfDirectoryException: Input is a directory: bazel-out/k8-fastbuild/testlogs/<path>/UnitTest/_coverage
com.google.devtools.build.lib.actions.DigestOfDirectoryException: Input is a directory: bazel-out/k8-fastbuild/testlogs/<path>/UnitTest/_coverage
com.google.devtools.build.lib.actions.DigestOfDirectoryException: Input is a directory: bazel-out/k8-fastbuild/testlogs/<path>/UnitTest/_coverage
com.google.devtools.build.lib.actions.DigestOfDirectoryException: Input is a directory: bazel-out/k8-fastbuild/testlogs/<path>/UnitTest/_coverage
com.google.devtools.build.lib.actions.DigestOfDirectoryException: Input is a directory: bazel-out/k8-fastbuild/testlogs/<path>/UnitTest/_coverage
com.google.devtools.build.lib.actions.DigestOfDirectoryException: Input is a directory: bazel-out/k8-fastbuild/testlogs/<path>/UnitTest/_coverage
com.google.devtools.build.lib.actions.DigestOfDirectoryException: Input is a directory: bazel-out/k8-fastbuild/testlogs/<path>/UnitTest/_coverage

I don't think there will be a stacktrace as there is a function consuming all Exceptions. I don't remember what that is since it's been a while. Let me try to recall.

anhlinh123 commented 9 months ago

Ok, so it all started with this function: https://github.com/bazelbuild/bazel/blob/7.0.0/src/main/java/com/google/devtools/build/lib/remote/RemoteSpawnCache.java#L82 which after all consumes all IOException: https://github.com/bazelbuild/bazel/blob/7.0.0/src/main/java/com/google/devtools/build/lib/remote/RemoteSpawnCache.java#L142

But the critical point seems to be here: https://github.com/bazelbuild/bazel/blob/7.0.0/src/main/java/com/google/devtools/build/lib/remote/AbstractActionInputPrefetcher.java#L418 where it tries to fetch metadata of a tree artifact. I'm not sure why it does that. But at that point, the tree artifact is a directory (I guess that is because the content of the directory was already evicted from the remote cache), and it breaks this code https://github.com/bazelbuild/bazel/blob/7.0.0/src/main/java/com/google/devtools/build/lib/exec/SingleBuildFileCache.java#L78

anhlinh123 commented 9 months ago

My repo also uses --experimental_split_coverage_postprocessing. But I believe it doesn't need that flag to reproduce the error (if the flag is disabled by default). It can be easily reproduce locally by this

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Use disk_cache instead of remote_cache to easily manipulate the cache. (add --disk_cache=<path_to_local_cache> to the command line).
Run a cc_test: bazel coverage <test> --disk_cache=<cache> --remote_upload_local_results=true --execution_log_json_file=<file_name>.
Open the json file, find the action that has "commandArgs": ["external/bazel_tools/tools/test/collect_coverage.sh"],
Find the hash of the action output (near the end of the action object).
Delete the file corresponding to the hash value in the cache directory.
Rerun the test.
The test fails with the error: I/O exception during sandboxed execution: Input is a directory: bazel-out/k8-fastbuild/testlogs/<path_to_the_test_dir>/_coverage.
tjgq commented 9 months ago

@anhlinh123 Do you mind giving https://github.com/bazelbuild/bazel/commit/b0db044227d62178a7e578b8e03c452d8c17af33 a try? (The simplest way is to use Bazelisk with USE_BAZEL_VERSION=last_green.) That would tell us whether there's indeed a separate issue that that commit didn't fix.

Otherwise, I think there might be something missing from your repro steps. You are building once, deleting a file from the disk cache, then rebuilding incrementally. I'd thus expect the incremental rebuild to be a no-op, since the output tree hasn't been touched and Bazel can tell it's up-to-date (and that's also what I'm seeing experimentally.) Is the second build a clean build? Are there any other flags in a .bazelrc? (Use --announce_rc to print all of the flags Bazel is using.)

tjgq commented 9 months ago

(It's USE_BAZEL_VERSION=last_green, not USE_BAZEL_VERSION=latest. I've amended the previous comment.)

anhlinh123 commented 9 months ago

@tjgq This is what I've found so far.

tjgq commented 9 months ago

That's great to hear, thanks. I will close this issue, since the fix has also been cherry-picked into the 7.0.1 branch (in https://github.com/bazelbuild/bazel/pull/20819).

anhlinh123 commented 9 months ago

@tjgq thank you for your support!

anhlinh123 commented 9 months ago

@tjgq The version 7.0.1rc1 has a weird bug related to this. Turning --experimental_split_coverage_postprocessing on always breaks coverage.

ERROR: <path>/test/BUILD:1:8: Testing //:test failed: I/O exception during sandboxed execution: Input is a directory: bazel-out/k8-fastbuild/testlogs/test/_coverage

The last_green doesn't have it.

tjgq commented 9 months ago

7.0.1rc1 was cut before the fix made it into the 7.0.1 branch. Can you try USE_BAZEL_VERSION=f4da34dcfe7b83388e3d963f35581a4fe710fc14 (current tip of the branch)?

anhlinh123 commented 9 months ago

@tjgq It works!!! Thank you!