bazel-contrib / toolchains_llvm

LLVM toolchain for bazel
Apache License 2.0
296 stars 213 forks source link

discrepancy with toolchain on builds on same OS & Arch #347

Closed fzakaria closed 2 months ago

fzakaria commented 3 months ago

We are using toolchain_llvm to try and bring reproducibility for C/C++ compilation however we've hit some oddities across two developer laptops.

I'm struggling to find the cause of the delta.

MODULE.bazel

llvm = use_extension("@toolchains_llvm//toolchain/extensions:llvm.bzl", "llvm")
llvm.toolchain(
   llvm_version = "16.0.0",
)

use_repo(llvm, "llvm_toolchain")
register_toolchains("@llvm_toolchain//:all")

We are building libjpeg https://github.com/freedesktop-unofficial-mirror/libjpeg/

load("@rules_cc//cc:defs.bzl", "cc_library")

genrule(
    name = "jconfig",
    srcs = [
        "jconfig.cfg",
        "jcmaster.c",
        "install-sh",
    ],
    outs = ["jconfig.h"],
    # Why do we need '-ansi' ?
    # Well to build with clang on macos or with the toolchain below we have to set it.
    # libjpeg is very old and predates C99 when you didn't even have to write return statements or return types.
    # Here was the error without it:
    # configure:626:1: error: type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int [-Wimplicit-int]
    #    main(){return(0);}
    #    ^
    #    int
    #    1 error generated.
    #    configure: failed program was:
    #    #line 625 "configure"
    #    #include "confdefs.h"
    #    main(){return(0);}
    cmd = "$(location configure) --srcdir $$(dirname $(location jconfig.cfg)); cp jconfig.h $(@)",
    toolchains = ["@bazel_tools//tools/cpp:current_cc_toolchain"],
    tools = ["configure"],
)

cc_library(
    name = "libjpeg",
    srcs = [
        "jcapimin.c",
        "jcapistd.c",
        "jccoefct.c",
        "jccolor.c",
        "jcdctmgr.c",
        "jchuff.c",
        "jcinit.c",
        "jcmainct.c",
        "jcmarker.c",
        "jcmaster.c",
        "jcomapi.c",
        "jconfig.h",
        "jcparam.c",
        "jcphuff.c",
        "jcprepct.c",
        "jcsample.c",
        "jctrans.c",
        "jdapimin.c",
        "jdapistd.c",
        "jdatadst.c",
        "jdatasrc.c",
        "jdcoefct.c",
        "jdcolor.c",
        "jddctmgr.c",
        "jdhuff.c",
        "jdinput.c",
        "jdmainct.c",
        "jdmarker.c",
        "jdmaster.c",
        "jdmerge.c",
        "jdphuff.c",
        "jdpostct.c",
        "jdsample.c",
        "jdtrans.c",
        "jerror.c",
        "jfdctflt.c",
        "jfdctfst.c",
        "jfdctint.c",
        "jidctflt.c",
        "jidctfst.c",
        "jidctint.c",
        "jidctred.c",
        "jmemmgr.c",
        "jquant1.c",
        "jquant2.c",
        "jutils.c",
    ] + [
        # This implementation uses memory to store so it's extremely portable
        # consider alternate implementations if you need to reduce memory usage.
        "jmemnobs.c",
    ],
    hdrs = glob(["*.h"]),
    includes = ["."],
    visibility = ["//visibility:public"],
    copts = [
        "-v"
    ]
)

The code is written in C90 which supports "implicit-int". Oddly on my machine, the code compiles correctly but on another it fails when hitting functions with implicit-int.

The solution at the moment is to turn off that check or add "-ansi" to copts but I'm left wondering why it's not consistent across our two machines.

We collected the compiler output to track differences if there was different flags being set and I couldn't see anything meaningful that would cause it. https://www.diffchecker.com/390Vul86/ (The diff does show that the toolchain is being picked up)

rrbutani commented 3 months ago

I'm not sure I'm following; when you say you end up needing to add -ansi to copts do you mean for the libjpeg target or for the jconfig genrule?

Regardless I suspect the issue here is that your configure script is using your host compiler instead of toolchains_llvm's compiler — adding to the toolchains attribute adds make vars that can be referenced in cmd but it doesn't automatically set env vars for these variables. Maybe try something like this?

-    cmd = "$(location configure) --srcdir $$(dirname $(location jconfig.cfg)); cp jconfig.h $(@)",
+    cmd = "CC='$(CC) $(CC_FLAGS)' $(location configure) --srcdir $$(dirname $(location jconfig.cfg)); cp jconfig.h $(@)",

Alternatively this might be a good use case for rules_foreign_cc.

fzakaria commented 3 months ago

Shoot. The example I put above is missing the $(CC) for configure but then I found that the building of the library afterwards had a similar error.

I set action_env PATH for the build of the cc_library so that they're the same across both machines and the error was reproduced on my machine.

It makes me believe there is something during cc_library that's picking up from the host as well even though I'm using toolchain

rrbutani commented 3 months ago

Hmm. Is the repo where you have this target public? Also are you able to share what $PATH was set to when you reproduced the error on your machine?

You can prevent host env vars from leaking into the build by setting --incompatible_strict_action_env but in this case I'm struggling to understand how different $PATHs could result in the error you're seeing. The -v output you shared shows that the clang from toolchains_llvm is being invoked; from there $PATH can influence what additional tools are invoked (i.e. ld64) but that's not what's happening in this case (error is coming from the compiler invocation itself).

fzakaria commented 2 months ago

Closing the loop here: I need to set CC_FLAGS but for that to work, I needed a different toolchain than what's listed in the doc.

This was causing the ./configure script to generate a different header file.