bazelbuild / rules_rust

Rust rules for Bazel
https://bazelbuild.github.io/rules_rust/
Apache License 2.0
656 stars 419 forks source link

segfault in test process glue when llvm_prof missing in rust 1.79 #2715

Open rbtcollins opened 3 months ago

rbtcollins commented 3 months ago

We see this:

Coverage runner: Not collecting coverage for failed test.
The following commands failed with status 139
/worker/build/8/root/bazel-out/linux_amd64-fastbuild/bin/build/rules/csdd/test-765406358/unit_test.runfiles/_main/build/rules/csdd/test-765406358/unit_test
thread 'main' panicked at external/rules_rust~/util/collect_coverage/collect_coverage.rs:140:10:
Failed to spawn llvm-profdata process: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

With rust 1.79 but not with rust 1.77.

We have neither llvm_profdata nor llvm_cov set / referenced in our Bazel code.

bazel test output ``` exited with error code -1 Standard Output Generated test.log (if the file is not UTF-8, then this may be unreadable): exec ${PAGER:-/usr/bin/less} "$0" || exit 1 Executing tests from //build/rules/csdd:unit_test ----------------------------------------------------------------------------- running 3 tests test services::redis::tests::test_redis ... ok test services::postgres::tests::test_postgres ... ok test services::pubsub::tests::test_pubsub ... ok test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 3.25s external/bazel_tools/tools/test/collect_coverage.sh: line 166: 4054 Segmentation fault "$@" -- Coverage runner: Not collecting coverage for failed test. The following commands failed with status 139 /worker/build/8/root/bazel-out/linux_amd64-fastbuild/bin/build/rules/csdd/test-765406358/unit_test.runfiles/_main/build/rules/csdd/test-765406358/unit_test thread 'main' panicked at external/rules_rust~/util/collect_coverage/collect_coverage.rs:140:10: Failed to spawn llvm-profdata process: Os { code: 2, kind: NotFound, message: "No such file or directory" } note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace Jun 25, 2024 6:12:09 AM com.google.devtools.coverageoutputgenerator.Main getTracefiles INFO: No lcov file found. Jun 25, 2024 6:12:09 AM com.google.devtools.coverageoutputgenerator.Main getGcovInfoFiles INFO: No gcov info file found. Jun 25, 2024 6:12:09 AM com.google.devtools.coverageoutputgenerator.Main getGcovJsonInfoFiles INFO: No gcov json file found. Jun 25, 2024 6:12:09 AM com.google.devtools.coverageoutputgenerator.Main getProfdataFileOrNull INFO: No .profdata file found. Jun 25, 2024 6:12:09 AM com.google.devtools.coverageoutputgenerator.Main runWithArgs WARNING: There was no coverage found. Action details (uncached result): http://bb-browser.iap.....dev/blobs/sha256/historical_execute_response/23f538e6964c6cb98c49831a6b060c7bd9b1646639d68d0fcc0a827a628f9339-1193/ ```
mortenmj commented 2 months ago

I did a little digging here, and have made some observations.

  1. Calling llvm_profdata fails with the same error on 1.77, but does not cause a crash. The error starts with 1.78, which updated the bundled version of llvm (to v18).
  2. The location of that binary is given by the RUST_LLVM_PROFDATA env var, which is set to rules_rust~~rust~rust_linux_x86_64__x86_64-unknown-linux-gnu__stable_tools/lib/rustlib/x86_64-unknown-linux-gnu/bin/llvm-profdata by default. This is first combined with the execroot path, and if no binary is found at that location it is then looked for in the test's runfiles. The binary does not exist at either location.
  3. If I set --@rules_rust//rust/settings:experimental_use_coverage_metadata_files the value of RUST_LLVM_PROFDATA becomes external/rules_rust~~rust~rust_linux_x86_64__x86_64-unknown-linux-gnu__stable_tools/lib/rustlib/x86_64-unknown-linux-gnu/bin/llvm-profdata. When the flag is set the file is also made available at that path, so this is valid. While the binary can now be invoked, we instead error out with llvm-profdata: error while loading shared libraries: libLLVM-17-rust-1.77.0-stable.so: cannot open shared object file: No such file or directory. libLLVM-17-rust-1.77.0-stable.so is not part of this action's files, so unless it is available on the host it won't be found. That does mean this might work if the user has the same rust version installed, but will typically be a problem with remote execution, which is what I've seen.