bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.05k stars 4.04k forks source link

bazel builds fail with "Error in download_and_extract: java.io.IOException: Couldn't delete temporary directory" #20013

Open ryanmacdonald opened 11 months ago

ryanmacdonald commented 11 months ago

Description of the bug:

Our team recently migrated our build to use Bazel v6.3.0, and we're seeing this error sometimes when we run a fresh build:

Analyzing: target <target I'm building>
INFO: Repository remote_java_tools_linux instantiated at: 
  /DEFAULT.WORKSPACE.SUFFIX:374:6: in <toplevel>
  <project_root_dir>/external/bazel_tools/tools/build_defs/repo/utils.bzl:233:18: in maybe
Repository rule http_archive defined at: 
  <project_root_dir>/external/bazel_tools/tools/build_defs/repo/http.bzl:372:31: in <toplevel>
ERROR: An error occurred during the fetch of repository 'remote_java_tools_linux':
   Traceback (most recent call last):
  File "<project_root_dir>/bazel_tools/tools/build_defs/repo/http.bzl", line 132, column 45, in _http_archive_impl
    download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Couldn't delete temporary directory (<project_root_dir>/remote_java_tools_linux/temp6718266017446538312)

As far as I understand, @remote_java_tools_linux is a dependency that Bazel pulls in as part of supporting native build macros that compile Java code, so I'm not sure why a user would be seeing permissions issues. Thoughts on what might be going wrong or how to proceed with debug?

Which category does this issue belong to?

External Dependency

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No response

Which operating system are you running Bazel on?

Red Hat Enterprise Linux Server v7.9

What is the output of bazel info release?

release 6.3.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No, but solicited help for this error on Stackoverflow, and the Bazel community slack without any engagement

Any other information, logs, or outputs that you want to share?

No response

iancha1992 commented 11 months ago

@ryanmacdonald Could you please provide sample code (repo) to reproduce this issue?

ryanmacdonald commented 11 months ago

@iancha1992 it will be difficult for me to do that, since the repo is closed source and the error occurs seemingly randomly from a fresh clone. Do you have any pointers on why this would occur or how I could proceed debugging it myself?

meteorcloudy commented 11 months ago

https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/bazel/repository/starlark/StarlarkBaseExternalContext.java;l=728-732

We should probably propagate the actual error message here to get more clue about why the deletion is failing.

meteorcloudy commented 10 months ago

@ryanmacdonald I created a custom binary based on 6.4.0 release with the actual error message propagated: https://github.com/bazelbuild/bazel/commits/release-6.4.0-gh-20013.

Can you please use Bazelisk with USE_BAZEL_VERSION=d015070467764158c34e60262c96dc62177964ad to rerun the build? Hopefully, we'll get more useful information for debugging.

ryanmacdonald commented 10 months ago

Hi @meteorcloudy, is it possible you could create a custom binary based on the 6.3.0 release for this, or give me guidance on how to do that myself? On a recent bazelisk auto-bump to 6.4.0 our builds started failing because of a C++ dependency error, so we're on 6.3.0 for now

meteorcloudy commented 10 months ago

@ryanmacdonald I pushed https://github.com/bazelbuild/bazel/commits/release-6.3.0-gh-20013, please wait about half an hour until our CI publishes binaries for 37c5c4c60802341b2172dd618993e52fe6d7102f

jscheid-ventana commented 7 months ago

We see this, as rarely as every six months, for at least 2.25 years. First on 4.2.1, now on 6.5.0. We do not know how to reproduce. Likely to do with new output bases, possibly with multiple client processes vying for a server lock.

It'd be nice to get the error message to be more specific. Also we observe it to exit with code 1, but perhaps a different code would help, especially if this ends up being an intermittent issue that could be retried successfully.

sthornington commented 2 weeks ago
ERROR: no such package '@@crate_index__adler2-2.0.0//': java.io.IOException: Couldn't delete temporary directory (/[...]/bazel_base/30530ab654f51d2e19c2cea9f38f4878/external/crate_index__adler2-2.0.0/temp13769777110914705970): /[...]/bazel_base/30530ab654f51d2e19c2cea9f38f4878/external/crate_index__adler2-2.0.0/temp13769777110914705970 (Directory not empty)