bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.25k stars 4.08k forks source link

Windows, JavaBuilder worker: JavaBuilder sometimes holds on to files #3678

Closed laszlocsomor closed 6 years ago

laszlocsomor commented 7 years ago

The JavaBuilder persistent worker sometimes holds on to files in the output tree preventing subsequent builds from overwriting those files.

This happens spuriously, but when it hits, the only solution is to kill the worker process.

C:\work\bazel>bazel build //src:bazel.exe
WARNING: C:/work/bazel/src/main/cpp/BUILD:79:12: in srcs attribute of cc_binary rule //src/main/cpp:client: please do not import '//src/main/native/windows:resources.o' directly. You should either move the file to this package or depend on an appropriate rule there.
INFO: Found 1 target...
ERROR: C:/work/bazel/src/main/java/com/google/devtools/common/options/processor/BUILD:16:1: failed to delete output files before executing action: C:/users/laszlocsomor/appdata/local/temp/_bazel_laszlocsomor/o6hbs7n0/execroot/io_bazel/bazel-out/host/bin/src/main/java/com/google/devtools/common/options/processor/liboptions_preprocessor.jar (Permission denied).
Target //src:bazel.exe failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 4.636s, Critical Path: 2.32s

C:\work\bazel>bazel build //src:bazel.exe
WARNING: C:/work/bazel/src/main/cpp/BUILD:79:12: in srcs attribute of cc_binary rule //src/main/cpp:client: please do not import '//src/main/native/windows:resources.o' directly. You should either move the file to this package or depend on an appropriate rule there.
INFO: Found 1 target...
ERROR: C:/work/bazel/src/main/java/com/google/devtools/common/options/processor/BUILD:16:1: failed to delete output files before executing action: C:/users/laszlocsomor/appdata/local/temp/_bazel_laszlocsomor/o6hbs7n0/execroot/io_bazel/bazel-out/host/bin/src/main/java/com/google/devtools/common/options/processor/liboptions_preprocessor.jar (Permission denied).
Target //src:bazel.exe failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2.408s, Critical Path: 0.59s

The process in Process Explorer:

javabuilder

meteorcloudy commented 7 years ago

/cc @philwo

philwo commented 7 years ago

@laszlocsomor Nice screenshot. Thanks! @cushon FYI

Unless someone has an idea just from looking at the screenshot or code, we might have to run @kohsuke's File Leak Detector (http://file-leak-detector.kohsuke.org/) on the worker process and hope to get a repro. It should pinpoint where in the code that file handle is opened, which hopefully also tells us why it's not closed.

laszlocsomor commented 6 years ago

Thanks for the suggestion Philipp! I haven't seen this bug since, but leaving it at P2 because if it does show up, I'd like to deal with it.

meteorcloudy commented 6 years ago

I believe this issue has already been fixed by https://github.com/bazelbuild/bazel/commit/6a4247b10f5cf040c1a7176498bef69c75b1b286, where we explicitly specify which handles to inherit when creating process with JNI code and use that JNI function to create Java worker.

laszlocsomor commented 6 years ago

Thanks! Let's close it then.

pmuetschard commented 6 years ago

I'm still seeing this very frequently (on multiple windows machines) on bazel 0.10.0. However, only when trying to link .so's for Android NDK APKs.

For example:

ERROR: xxx/BUILD.bazel:17:1: failed to delete output files before executing action: C:/dev/tmp/_bazel_pascal/csxukb-c/execroot/xxx/bazel-out/android-arm64-v8a-fastbuild/bin/xxx/nativedeps/xxx/libxxx.so (Permission denied)
meteorcloudy commented 6 years ago

@laszlocsomor Maybe this is caused by some android toolchain not releasing the handler?

laszlocsomor commented 6 years ago

@meteorcloudy : Without knowing more, I can only guess. Luckily though, we can also find it out! Read on.

@pmuetschard : Could you please download Process Explorer and the next time you see this error, use Process Explorer to search for processes holding open that file? You can do so by pressing Ctrl+F (in Process Explorer) to bring up the search window, entering the basename of the file (e.g. "libxxx.so"), then finding the process(es) that hold that file.

pmuetschard commented 6 years ago

Unfortunately, process explorer would not find anything. The thing is, re-running bazel after a failure, usually allows it to "get past" the .so that failed. So it looks more like a race condition. In our project, we build quite a few libs ~20, and so it would then fail on another one if I just re-run bazel build. I tried using -k and searching for an so as soon as the error is printed, while bazel was still doing more work, but alas, no luck - it's as if whatever holds onto the file, releases it as soon as the failure happens. In my runs with -k it would fail on more than one .so, too. We also extract the .so files and "re-package" them by using the extracted file as input to another android_binary rule, so I estimate that there are nearly 100 points where this could fail. Since most of them pass, this indeed does look much more like a race than a systemic issue. Something to note: this only happens on "incremental" builds. I.e., I've not seen this error happen on our CI build servers, which start from a clean checkout each time. However, I have seen this on multiple different Windows machines.

laszlocsomor commented 6 years ago

I tend to eagerly suspect the AntiVirus or Bit9/Carbon Black, but in this case I believe it's neither.

The fact that you only saw this error on incremental builds but not clean ones, and that Bazel shows the "failed to delete output files..." error when it fails to delete outputs from the previous build, could be explained by a background process still holding the file open. This could be a persistent worker (perhaps JavaBuilder) that holds on to the output file, either because it doesn't close the file at all, or it doesn't close it in a timely manner.

I need to find a way to repro this locally, that's what I'll do next.

laszlocsomor commented 6 years ago

@pmuetschard : Is there anything else you notice, like any patterns in the kind of changes after which you see the errors? For example, do you change dependencies of rules, add/remove rules; or do you see the error already when you do minor edits in source files and want to rebuild?

pmuetschard commented 6 years ago

I see it for all of the above. Naturally, I see it most often for changes that cause all of the libs to be rebuilt.

I also don't think it's the AntiVirus, because I see it, too on a machine that has no anti-virus and I use a custom %TMP%, which is excluded from Windows 10's malware scanner.

FYI, I see this when building GAPID.

pmuetschard commented 6 years ago

Any updates on this? I see this very frequently. Always with Android native libraries. Below is a session. Note that I did nothing in between bazel invocation. Simply ran it again each time it failed:

C:\dev\gapid>bazel build pkg
INFO: Analysed target //:pkg (0 packages loaded).
INFO: Found 1 target...
ERROR: C:/dev/gapid/gapii/apk/BUILD.bazel:17:1: failed to delete output files before executing action: C:/users/pascal/_bazel_pascal/cmrueb7c/execroot/gapid/bazel-out/android-arm64-v8a-fastbuild/bin/_nativedeps/daba7a8824506e39f38af05e8f9311cd.so (Permission denied)
Target //:pkg failed to build
INFO: Elapsed time: 66.205s, Critical Path: 49.77s
INFO: 400 processes, local.
FAILED: Build did NOT complete successfully

C:\dev\gapid>bazel build pkg
INFO: Analysed target //:pkg (0 packages loaded).
INFO: Found 1 target...
ERROR: C:/dev/gapid/gapii/apk/BUILD.bazel:17:1: failed to delete output files before executing action: C:/users/pascal/_bazel_pascal/cmrueb7c/execroot/gapid/bazel-out/android-arm64-v8a-fastbuild/bin/gapii/apk/nativedeps/gapii/libgapii.so (Permission denied)
Target //:pkg failed to build
INFO: Elapsed time: 7.697s, Critical Path: 5.14s
INFO: 12 processes, local.
FAILED: Build did NOT complete successfully

C:\dev\gapid>bazel build pkg
INFO: Analysed target //:pkg (0 packages loaded).
INFO: Found 1 target...
ERROR: C:/dev/gapid/gapii/apk/BUILD.bazel:17:1: failed to delete output files before executing action: C:/users/pascal/_bazel_pascal/cmrueb7c/execroot/gapid/bazel-out/android-x86-fastbuild/bin/_nativedeps/077af9bd475f3d8e6a7aac4cd66821f3.so (Permission denied)
Target //:pkg failed to build
INFO: Elapsed time: 46.286s, Critical Path: 43.75s
INFO: 24 processes, local.
FAILED: Build did NOT complete successfully

C:\dev\gapid>bazel build pkg
INFO: Analysed target //:pkg (0 packages loaded).
INFO: Found 1 target...
ERROR: C:/dev/gapid/gapii/apk/BUILD.bazel:17:1: failed to delete output files before executing action: C:/users/pascal/_bazel_pascal/cmrueb7c/execroot/gapid/bazel-out/android-x86-fastbuild/bin/gapii/apk/nativedeps/gapii/libgapii.so (Permission denied)
Target //:pkg failed to build
INFO: Elapsed time: 4.668s, Critical Path: 2.24s
INFO: 4 processes, local.
FAILED: Build did NOT complete successfully

C:\dev\gapid>bazel build pkg
INFO: Analysed target //:pkg (0 packages loaded).
INFO: Found 1 target...
ERROR: C:/dev/gapid/gapii/apk/BUILD.bazel:17:1: failed to delete output files before executing action: C:/users/pascal/_bazel_pascal/cmrueb7c/execroot/gapid/bazel-out/android-armeabi-v7a-fastbuild/bin/_nativedeps/1b627f32a9110058c42c051ea5f0a1be.so (Permission denied)
Target //:pkg failed to build
INFO: Elapsed time: 61.789s, Critical Path: 59.14s
INFO: 10 processes, local.
FAILED: Build did NOT complete successfully

C:\dev\gapid>bazel build pkg
INFO: Analysed target //:pkg (0 packages loaded).
INFO: Found 1 target...
ERROR: C:/dev/gapid/gapii/apk/BUILD.bazel:17:1: failed to delete output files before executing action: C:/users/pascal/_bazel_pascal/cmrueb7c/execroot/gapid/bazel-out/android-armeabi-v7a-fastbuild/bin/gapii/apk/nativedeps/gapii/libgapii.so (Permission denied)
Target //:pkg failed to build
INFO: Elapsed time: 4.479s, Critical Path: 1.89s
INFO: 1 process, local.
FAILED: Build did NOT complete successfully

C:\dev\gapid>bazel build pkg
INFO: Analysed target //:pkg (0 packages loaded).
INFO: Found 1 target...
Target //:pkg up-to-date:
  <stuff>
INFO: Elapsed time: 11.864s, Critical Path: 9.38s
INFO: 37 processes, local.
INFO: Build completed successfully, 39 total actions
laszlocsomor commented 6 years ago

Sorry, no updates. I tried building GAPID again today with Bazel 0.14.1, ran into some problems that I had to fix (e.g. badly configured PATH, wrong Android SDK installed) and that took up my time. I'll try looking again next Monday.

laszlocsomor commented 6 years ago

FYI, https://github.com/bazelbuild/bazel/pull/5520 will make file deletion more robust and be able to delete readonly files (together with a follow-up PR that makes use of the new code).

Those, in combination with some recent commits that make Bazel eagerly close files (https://github.com/bazelbuild/bazel/commit/59f17d6e0550bf63a0b6ef182e2d63474e058ede, https://github.com/bazelbuild/bazel/commit/09d20311d982606093ed881d779bb05a5ee70ed3, https://github.com/bazelbuild/bazel/commit/f11c6bcab87ee8927e23a23d2300900a0922616d), will hopefully solve this problem.