bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.99k stars 4.03k forks source link

Missing stderr for failed test XML creation #12649

Open Flamefire opened 3 years ago

Flamefire commented 3 years ago

Description of the problem / feature request:

I'm running some test of TensorFlow using bazel but on our multi-core POWER9 system it fails with e.g.

ERROR: /dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/TensorFlow/tensorflow-r2.4/tensorflow/core/platform/BUILD:1142:11: failed (Exit 1): generate-xml.sh failed: error executing command

I.e. there is no good error message, it simply failed to execute that script which comes from the Bazel installation. I verified that the executed command (bazel -s) runs correctly and the script hence also exists

The problem is an unset LD_LIBRARY_PATH, see #12579, but the bug here is that stderr is not reported which made the trouble-shooting process incredibly laborious.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Sorry, only thing I have is the command I use to test TF:

bazel --output_base=/dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/tmptspeEg-bazel-tf/output_base --install_base=/dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/tmptspeEg-bazel-tf/output_base/inst_base --output_user_root=/dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/tmptspeEg-bazel-tf/output_user_root --host_jvm_args=-Xms512m --host_jvm_args=-Xmx4096m test --compilation_mode=opt --config=opt --subcommands --verbose_failures --config=noaws --jobs=64 --copt="-fPIC"  --distinct_host_configuration=false --test_output=errors --local_test_jobs=1 --build_tests_only --test_tag_filters='-no_gpu,-no_oss,-oss_serial,-benchmark-test,-no_oss_py37,-v1only' --build_tag_filters='-no_gpu,-no_oss,-oss_serial,-benchmark-test,-no_oss_py37,-v1only'  --//tensorflow/c:c_test

What operating system are you running Bazel on?

RHEL 7.6

What's the output of bazel info release?

release 3.4.1- (@non-git)

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

EXTRA_BAZEL_ARGS="--jobs=176 --host_javabase=@local_jdk//:jdk" ./compile.sh

Have you found anything relevant by searching the web?

12579 #4137

Any other information, logs, or outputs that you want to share?

ERROR: /dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/TensorFlow/tensorflow-r2.4/tensorflow/core/platform/BUILD:1142:11:  failed (Exit 1): generate-xml.sh failed: error executing command 
  (cd /dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/tmptspeEg-bazel-tf/output_base/execroot/org_tensorflow && \
  exec env - \
    PATH=/usr/bin:/bin \
    TEST_BINARY=tensorflow/core/platform/platform_strings_test \
    TEST_NAME=//tensorflow/core/platform:platform_strings_test \
    TEST_SHARD_INDEX=0 \
    TEST_TOTAL_SHARDS=0 \
  external/bazel_tools/tools/test/generate-xml.sh bazel-out/ppc-opt/testlogs/tensorflow/core/platform/platform_strings_test/test.log bazel-out/ppc-opt/testlogs/tensorflow/core/platform/platform_strings_test/test.xml 0 0)
Execution platform: @local_execution_config_platform//:platform
github-actions[bot] commented 1 year ago

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

Flamefire commented 1 year ago

No updates here? I guess this still happens

github-actions[bot] commented 2 months ago

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.

Flamefire commented 2 months ago

No updates here? I guess this still happens

PikachuHyA commented 2 months ago

encountered the issue generate-xml.sh failed: error executing command, but it is difficult to reproduce.