bazel-contrib / rules_foreign_cc

Build rules for interfacing with "foreign" (non-Bazel) build systems (CMake, configure-make, GNU Make, boost, ninja, Meson)
https://bazel-contrib.github.io/rules_foreign_cc
Apache License 2.0
679 stars 249 forks source link

Fix `_get_make_variables` ignoring user environment variables #1230

Closed allsey87 closed 3 months ago

allsey87 commented 4 months ago

This resolves a bug in _get_make_variables that causes the user environmental variables such as LDFLAGS to be ignored.

In the case of LDFLAGS, this happens when cxx_linker_executable is empty and results in completely ignoring [user_env_vars[user_var]] since toolchain_val is None.

I believe this branch was put in place to guard against adding None to a list. I have solved this by removing the branch and ensuring that toolchain_val is always a list even if variables like cxx_linker_executable are empty.

allsey87 commented 4 months ago

I don't have a Window machine but at a guess it seems that LDFLAGS is now picking up a bunch of environment variables that are unrelated to the build?

LDFLAGS='-nologo -SUBSYSTEM:CONSOLE -MACHINE:X64 -DEFAULTLIB:msvcrt.lib -DEBUG:FASTLINK -INCREMENTAL:NO -LC:/b/utfpdh5p/execroot/rules_foreign_cc/external/glib_runtime/bin -LC:/b/utfpdh5p/execroot/rules_foreign_cc/external/gettext_runtime/bin'

Annoyingly Bazel does not print out the build environment when the test succeeds so there is no way in determining what was inside LDFLAGS before (from CircleCI). One of these flags is probably responsible for the crash on Windows.

Any thoughts on how to approach this?

allsey87 commented 3 months ago

@jsharpe @jheaff1 I will look into fixing this once #1223 is reviewed and merged. I cannot invest time into this sort of work otherwise...

allsey87 commented 3 months ago

@jsharpe thanks for merging #1223! I will try to get to the bottom of the CI errors on Windows tomorrow

allsey87 commented 3 months ago

Digging in a bit deeper to this, what is happening is pkgconfig is failing to build on Windows. It seems that pkgconfig has a dependency on glib which in turn is trying to find stddef.h which it fails to so.

C:\b\cggvqaw6\execroot\rules_foreign_cc\external\glib_dev\include\glib-2.0\glib/gmacros.h(40): fatal error C1083: Cannot open include file: 'stddef.h': No such file or directory
parse.c
C:\b\cggvqaw6\execroot\rules_foreign_cc\external\glib_dev\include\glib-2.0\glib/gmacros.h(40): fatal error C1083: Cannot open include file: 'stddef.h': No such file or directory
rpmvercmp.c
C:\b\cggvqaw6\execroot\rules_foreign_cc\external\glib_dev\include\glib-2.0\glib/gmacros.h(40): fatal error C1083: Cannot open include file: 'stddef.h': No such file or directory
main.c
C:\b\cggvqaw6\execroot\rules_foreign_cc\external\glib_dev\include\glib-2.0\glib/gmacros.h(40): fatal error C1083: Cannot open include file: 'stddef.h': No such file or directory
NMAKE : fatal error U1077: '"C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.39.33519/bin/HostX64/x64/cl.exe" /MD /O2 /GL /W3 /Zi /FImsvc_recommended_pragmas.h /I.

At first, I thought it have something to do with the following environment variable overwriting the include dirs that contain stdbool.h:

INCLUDE=C:/b/cggvqaw6/execroot/rules_foreign_cc/external/glib_src

However, the code in question shouldn't interact with INCLUDE since it is not inside _MAKE_FLAGS. Another possible cause of the issue could be if a tool is specified in user_env_vars, it will update tools_dict without going through the _absolutize function and logic for which there are comments specifically talking about Windows and nmake...

https://github.com/bazelbuild/rules_foreign_cc/blob/60d39590137735529a7636894d7f7d54ecd6e52d/foreign_cc/private/make_env_vars.bzl#L111-L134

allsey87 commented 3 months ago

@jsharpe all lights are finally green! There were two things preventing Windows builds from working correctly:

  1. INCLUDE=["$EXT_BUILD_ROOT/external/glib_src"]} was breaking pkgconfig compilation (stddef.h not found)
  2. PATH=["$(dirname $$EXT_BUILD_ROOT$$/external/nasm/nasm.exe)"] was breaking OpenSSL compilation (nasm not found)

It is a bit strange that these flags are being set in the first place since not only are they not required for compilation, they actually break it. I believe these come from the configuration of the Windows VMs for Buildkite and Bazel CI, although I could not locate the sources that result in these environment variables being set. I suspect that these variables should be cleared or we should run CI with --incompatible_strict_action_env, however, this is beyond the scope of this PR/repository.

The fix I put in place was to only consider the environment variables listed in _MAKE_FLAGS since my initial attempt accidently opened the floodgates and let all environmental variables through, not just the _MAKE_FLAGS. I had to rewrite this function a bit to achieve this, however, I believe the new logic is correct.