bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.32k stars 4.09k forks source link

Java binary startup script broken when targeting windows (cygwin) #20065

Open th0masb opened 1 year ago

th0masb commented 1 year ago

Description of the bug:

I am building Java binaries on Linux which target Windows. The workflow is essentially build an archive containing the runfiles for my binary, copy that onto windows, extract and then run via the standard startup script which looks like it is designed to support cygwin. However the script is broken in a number of ways when I try to run on cygwin.

Which category does this issue belong to?

Java Rules

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I've created a minimal repository for reproducing the issue:

  1. Clone https://github.com/th0masb/bazel-java-startup-bug on a Linux environment
  2. Run bazel build --platforms //:windows_x86_64 :dist.tar
  3. Copy the resulting dist.tar archive to a windows machine with cygwin
  4. Unpack the dist.tar in cygwin
  5. Execute the entrypoint script from cygwin, it should fail. Execute the script with --classpath_limit=1, it should fail with a different reason.

Sample output:

$ ./.runfiles/__main__/main
Error: Could not find or load main class Main
Caused by: java.lang.ClassNotFoundException: Main

$ ./.runfiles/__main__/main --classpath_limit=1
./.runfiles/__main__/main: line 366: D:\Tools\Tom\.runfiles/remotejdk17_win/bin/java/jar.exe: No such file or directory
./.runfiles/__main__/main: ERROR: /cygdrive/d/Tools/Tom/./.runfiles/__main__/main failed because D:\Tools\Tom\.runfiles/remotejdk17_win/bin/java/jar.exe failed

Which operating system are you running Bazel on?

Host + Execution Linux (Pop!_OS 22.04 LTS x86_64), target is Windows x86_64

What is the output of bazel info release?

release 6.4.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

Unsure

Have you found anything relevant by searching the web?

No

Any other information, logs, or outputs that you want to share?

I have identified how this could be fixed, the changes should be fairly minimal and low risk. I am happy to try and open a PR with a fix myself.

th0masb commented 1 year ago

I can see this todo in the code https://github.com/bazelbuild/bazel/blob/master/src/main/starlark/builtins_bzl/common/java/java_helper.bzl#L252 which looks like it is contributing to my issue, is the desired behaviour to cross compile the windows executable when targeting windows from linux?

fmeum commented 1 year ago

CC @tjgq

Cross-compiling to Windows is hard, but there are precompiled launcher binaries shipped with the Windows build of Bazel that could be used for this purpose.

th0masb commented 1 year ago

I'm not sure I understand why this is difficult wrt the launcher? It could be implemented as a batch script analogous to this shell template as opposed to a compiled binary.

Anyway I would prefer to keep the scope of this issue restricted to cygwin. It is clear just from skimming the stub template that cygwin is intended to be a supported environment (there are lots of existing cygpath calls) and this issue is just about fixing that support.

hvadehra commented 10 months ago

It is clear just from skimming the stub template that cygwin is intended to be a supported environment (there are lots of existing cygpath calls)

So, the stub script isn't used/supported for windows. The cygwin references you see there are leftover from when it used to be (~2016) and these never got cleaned up. Since 2017, a cc binary is used instead. I found the design doc that discusses the motivation and alternatives. I agree with you in that we could use a batch script, and the doc does have some discussion for why this path was not chosen.

Zooming out, cross-compiling for/from windows is currently completely broken, so I'm afraid your workflow just won't work. As you've already noticed, the issue is that targetting windows assumes Bazel itself is running on Windows. I made a change recently (https://github.com/bazelbuild/bazel/commit/840f440e64ed7716436f507667b7391a4ef4bbae) that is a first step towards fixing this. But there is more work left, in particular, around the launcher, which is still tied to the host system.

I plan to get around to fixing this, but I'm afraid it's not high-priority. Contributions would be very welcome.

th0masb commented 10 months ago

so I'm afraid your workflow just won't work

My workflow is currently working with this change, although it is hacky. Basically I'm applying this change to the startup script using a diff file, the patch utility and a genrule as part of the build process. Then on the windows host I am invoking cygwin from a simple(ish) batch script to call this patched startup script. Everything works ok, the main issue is needing cygwin installed on the host.

From our experience this is still easier than introducing cross compilation of a c++ binary, especially for a team of devs not familiar with c++ or its build process.

I agree with you in that we could use a batch script, and the doc does have some discussion for why this path was not chosen.

I looked and I don't agree that the right decision was taken here, I'm not disagreeing it would be difficult but I think it should have been seriously explored first before going with a c++ launcher. If we had such a batch script then cross compilation would become easy and Java devs wouldn't have to think about c++.

hvadehra commented 10 months ago

I looked and I don't agree that the right decision was taken here,

FWIW this is my inclination as well, but I might be underestimating the difficulty of the alternatives.

it should have been seriously explored first before going with a c++ launcher.

@meteorcloudy any chance we still have the batch/ps scripts from the design doc?

fmeum commented 10 months ago

From our experience this is still easier than introducing cross compilation of a c++ binary, especially for a team of devs not familiar with c++ or its build process.

Just to clarify this point: While the launcher binary is written in C++, I don't think that the plan is to require this binary to be compiled from source in cross-platform builds (or even in single-platform builds on Windows). It would be shipped in precompiled form. However, right now it is shipped with the Bazel Windows binary only. To support cross-platform builds, it would need to be moved out into some external repository that is fetched only when targeting Windows.

th0masb commented 10 months ago

Ok, that makes more sense

meteorcloudy commented 10 months ago

any chance we still have the batch/ps scripts from the design doc?

Using batch or powershell scripts were considered infeasible because of performance and maintenance issues.