bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.69k stars 3.98k forks source link

cc_library: strip_include_prefix doesn't work on external repo paths #21443

Open novas0x2a opened 4 months ago

novas0x2a commented 4 months ago

Description of the feature request:

Note: description is left untouched for context, but the comments below make it more clear the problem is different than originally assumed

In cc_library, the hdrs field docs say

These headers will be made available for inclusion by sources in this rule or in dependent rules.

However, cc_library doesn't copy the hdr inputs to the outputs (or otherwise project indirect dependencies), so the headers are only made available to dependent rules if the hdrs are local to the package. In other words, this does not currently work and I'd like it to:

cc_library(
    name = "cat",
    srcs = ["cat.c"],
    hdrs = ["@other//:cat.h"],
)

# This will fail, being unable to find cat.h
cc_binary(
    name = "dog",
    srcs = ["dog.c"], # assume this has an `#include "cat.h"`
    deps = [":cat"],
)

Having it copy the headers to the outputs would make it behave somewhat like cc_binary does today regarding external dlls, causing things to work regardless of where the source files come from.

A full (though obviously toy) example is here: https://github.com/novas0x2a/external-hdr. It has two local targets:

Which category does this issue belong to?

No response

What underlying problem are you trying to solve with this feature?

When integrating thirdparty packages that don't have bazel-native build systems, sometimes you have to do some gross things to make it work; the usual advice is to provide your own build_file/build_file_content in the repo rule, and then do what you need to. However, if you need to get the header from a package that isn't local to the external (like, say, it depends on files in the primary repo workspace) now you end up needing a mix of external and non-external packages, and it creates kind of spaghetti-like flow that makes packages very difficult to follow.

If cc_library copied non-local hdrs to its outputs, then the complexity/confusion goes away.

Which operating system are you running Bazel on?

linux

What is the output of bazel info release?

release 7.0.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

n/a

What's the output of git remote get-url origin; git rev-parse HEAD ?

n/a

Have you found anything relevant by searching the web?

There's a somewhat similar ticket in https://github.com/bazelbuild/bazel/issues/19972, that's actually affected by this same issue, but also had a configuration issue and had a different intent (even if the fix is likely similar).

Any other information, logs, or outputs that you want to share?

No response

alexeagle commented 2 months ago

@jsharpe @illicitonion does this request make sense to you?

illicitonion commented 2 months ago

From the example repo, I think the problem is one of unexpected include paths?

In this example:

cc_binary(
    name = "dog-remote",
    srcs = ["dog.c"],
    deps = ["@cat//:cat-remote"],
)

cat-remote is defined:

cc_library(
    name = "cat-remote",
    srcs = ["cat.c"],
    hdrs = ["@external-hdr//other:cat.h"],
)

In cat-remote, cat.h isn't in the root of the workspace, it's in a directory: other/cat.h. So it needs to be #included as "other/cat.h" not "cat.h".

If you apply this change:

diff --git a/dog.c b/dog.c
index 513914d..7a45bab 100644
--- a/dog.c
+++ b/dog.c
@@ -1,4 +1,4 @@
-#include "cat.h"
+#include "other/cat.h"

 int main(int argc, char *argv[]) {
   return meow();

then dog-remote will successfully build (but dog-local will break :)).

So what I think is going on here is that the files are correctly present, but at different paths than you're expecting?

I don't think it makes sense for us to copy the files to different paths, but maybe there's a gap around not being able to apply strip_include_prefix and similar to these headers? But probably the solution to that is to wrap the header in a cc_library in its remote repo, which I guess is the thing you're saying is annoyingly boilerplatey (which I agree about!)?

novas0x2a commented 2 months ago

@illicitonion thanks for the reply. Part of what you're seeing is my lossy repro of our internal code (the other path working) but you're right, it does get into the sandbox, it's just that the include would need to be

#include "external/cat/cat.h"

which is a pretty painful thing to propagate for all affected packages. The gap really might just be around strip_include_prefix; If I pass strip_include_prefix = "/external/cat" I get Error in fail: header 'cat.h' is not under the specified strip prefix '/external/cat' (and the relative path version of strip_include_prefix fails similarly) (This is the error message I misjudged that led me to think the header wasn't there at all)

Yeah, in theory putting the cc_library call in the remote repo would likely work, but the boilerplate is extremely prohibitive; it's hard to show how publically, but we're doing an internal migration of an existing codebase, and that leaves us less flexible and the boilerplate required is even more painful than you're imagining, I think :))

jsharpe commented 2 months ago

I've never understood the cases where strip_include_prefix is used on its own; I always prefer to use includes. I'd only use strip_include_prefix if I'm also making use of include_prefix. The reason why is because strip_include_prefix generates extra symlinks on the filesystem in the virtual includes directory.

So in this case I would simply add includes = ["external/cat"] to my cc_library which would allow me to #include "cat.h" in any dependent targets.

illicitonion commented 2 months ago

@novas0x2a I think it'd be helpful to know what the target you're actually depending on in the external repo is...

If there is a cc_library in the external repo, then you should be able to add includes = ["."] to that cc_library (which may have some bugs around prefix stripping, but which should be fixable).

If you're actually doing something like adding an exported file to srcs or something, that's quite a different solution design space.

But if you can put together a representative example of what you're trying to do, I suspect we can work out something reasonable!

This also feels related to https://github.com/bazelbuild/bazel/issues/19665 which is asking for something quite similar but for the --per_file_copt flag

novas0x2a commented 2 months ago

@jsharpe point taken about the prefix options (we'd prefer not to use them either). An important note is that includes don't work for externals either, because it appears to require that the paths are relative to the local package. Assuming the cc_library call is in //thirdparty/dog:BUILD:

@illicitonion we're doing the latter, adding an exported file to srcs. I'll try to come up with a more representative example. It'll probably still be lossy, but probably less lossy than the one I originally posted. (I was trying to come up with a version that'd be acceptable to add to //src/test/shell/bazel:cc_integration_test)

matt-sm commented 1 month ago

This is a more representative example: https://github.com/matt-sm/cc-library-repo

illicitonion commented 1 month ago

Thanks for the additional repro! To check, you are creating cc_library targets, just not in the external repo?

Assuming so, I think the key building block we'd want to improve is setting the includes attribute in that cc_library.

This change makes your code compile:

diff --git a/thirdparty/xxHash/BUILD.bazel b/thirdparty/xxHash/BUILD.bazel
index ba30ae9..83f9767 100644
--- a/thirdparty/xxHash/BUILD.bazel
+++ b/thirdparty/xxHash/BUILD.bazel
@@ -7,6 +7,7 @@ cc_library(
         "@xxHash//:xxhash.h",
     ],
     strip_include_prefix = "",
+    includes = ["../../external/xxHash"],
     visibility = ["//visibility:public"],
 )

The idea here is that the cc_library is responsible for propagating to its dependees the idea that its headers need to be included somehow. This feels like a reasonable place for this responsibility to live, just it's an ugly thing to express right now.

imo the two things that make this ugly are:

  1. Needing to compute a number of ../s to prefix to the path to read the execroot relative to the current package. An ugly workaround for this is a function like def up_to_execroot(): return "../" * len(native.package_name().split("/")) but ideally you wouldn't need to call this at all.
  2. Needing to hard-code the convention that repositories are made present in the execroot as"external/[repo-name]"

I can imagine a few different ways this could look, but I imagine we could pretty reasonably make one of these two things work:

diff --git a/thirdparty/xxHash/BUILD.bazel b/thirdparty/xxHash/BUILD.bazel
index ba30ae9..83f9767 100644
--- a/thirdparty/xxHash/BUILD.bazel
+++ b/thirdparty/xxHash/BUILD.bazel
@@ -7,6 +7,7 @@ cc_library(
         "@xxHash//:xxhash.h",
     ],
     strip_include_prefix = "",
+    includes = ["."], # "." gets automatically expanded to an execpath-relative path to the current package - this is how includes = ["."] currently works within a repo, so making cross-repo aware seems reasonable
     visibility = ["//visibility:public"],
 )
diff --git a/thirdparty/xxHash/BUILD.bazel b/thirdparty/xxHash/BUILD.bazel
index ba30ae9..83f9767 100644
--- a/thirdparty/xxHash/BUILD.bazel
+++ b/thirdparty/xxHash/BUILD.bazel
@@ -7,6 +7,7 @@ cc_library(
         "@xxHash//:xxhash.h",
     ],
     strip_include_prefix = "",
+    includes = ["$(execdir .)"], # $(execdir) would act like $(execpath) but operate on a directory in the current package rather than a label. I've seen need for this elsewhere, e.g. when telling rules_foreign_cc and friends about include directories, e.g. https://github.com/bazelbuild/rules_rust/blob/dd7a6944a1a8d246e9c800b16e6bc35d49e07b80/examples/crate_universe/WORKSPACE.bazel#L673-L674
     visibility = ["//visibility:public"],
 )
matt-sm commented 1 month ago

Thanks so much for the response

you are creating cc_library targets, just not in the external repo

yes that's correct

The '../../' workaround will unblock us for now. And I like some of your ideas as to how this could be solved in Bazel.