facebook / buck2

Build system, successor to Buck
https://buck2.build/
Apache License 2.0
3.35k stars 195 forks source link

Buck not isolating the compiler process #225

Open andyfriesen opened 1 year ago

andyfriesen commented 1 year ago

The compiler process seems to be able to find headers that are not listed as dependencies. Is this expected?

My test case is thus:

// hello.cpp
#include <stdio.h>
#include "bad.h"

int main() {
    printf("Hello World!\n");
    return 0;
}

// bad.h

// A commented-out syntax error!

And a simple BUCK build script:

cxx_binary(
    name="hello",
    srcs=["hello.cpp"],
    link_style='static'
)

As you can see, I have forgotten to list bad.h as a dependency.

buck2 will happily compile this and if bad.h is changed, it will not recompile anything.

thoughtpolice commented 1 year ago

Unfortunately Buck2 isn't (yet) hermetic in local mode, as of right now; it is only hermetic when using remote execution, which is by definition hermetic, because the input files have to be uploaded and declared correctly before they can be materialized and commands invoked on the remote executor. My understanding is that remote execution was an afterthought in buck1 but a major priority in buck2, and so therefore most rules are safely remote capable by default (with a proper toolchain); therefore relying on remote execution in production is 'good enough' in practice, but somewhat poor for the rest of us. See also #105; ideally there should be a local-only executor that enforces similar levels of isolation, but it doesn't exist yet.

stepancheg commented 1 year ago

FYI Buck1 can track headers by parsing compiler output. Compilers can output all headers used. But Buck2 does not do that.

cjhopman commented 1 year ago

I think buck2 feasibly could, it's already parsing the depfiles that buck1 uses for detecting this. I'm actually kinda surprised that it doesn't end up being an error in buck2's depfile handling when it ends up that there's an entry that wasn't in the initial inputs. @krallin might recall whether that's necessary for something or not.

meroton-benjamin commented 1 year ago

You could get proper isolation even when running locally if you use a remote execution instance on your development machine and set up buck to use "remote execution" at localhost.

It should be fairly simple run run.sh from buildbarn to start a local buildbarn instance and set your .buckconfig as in the example.

krallin commented 1 year ago

I'm actually kinda surprised that it doesn't end up being an error in buck2's depfile handling when it ends up that there's an entry that wasn't in the initial inputs. @krallin might recall whether that's necessary for something or not.

IIRC, there were some tools (specifically with XCode) that might end up reporting some paths that were outside the input set and were instead part of the XCode install.

andyfriesen commented 1 year ago

This has been super helpful. I'll give buildbarn a shot.

Thanks everyone!

andyfriesen commented 1 year ago

Hello again!

I have one futher question:

I have successfully gotten Buildbarn going, but the remote environment does not have access to a compiler. Is that something I configure on the Buck side of things or is that a Buildbarn thing?

thoughtpolice commented 1 year ago

My understanding is that you have two options going forward:

IMO the second method is how things should be endorsed to work by default, because it means your container doesn't need to have so much stuff inside of it. But A) it's tricky — where do you download gcc/XYZ-language binaries from? GCC doesn't do binaries for instance, and B) there's not an immediately obvious example.

Actually, it does exist I think, the best example of this is this BUCK file for a Zig-powered C/C++ toolchain under examples/. It downloads the Zig compiler via download_zig_distribution(name = "zig", ...), then points the CXX toolchain to use it as the compiler via cxx_zig_toolchain(..., distribution = ":zig"). Why does it do this? Because Zig includes both a version of Clang, and it includes the necessary libc/libcxx bits to compile code out of the box, so it can work as a kind of portable C/C++ compiler all-in-one. In contrast, downloading clang from llvm.org for example isn't enough; would still require you to also set up your libc/libc++ too. That has to happen in one of two ways, just like I described above. (Maybe one day when LLVM ships their libc, the llvm.org binary distributions will truly be all-in-one...)

See also #147 and this comment in particular of mine where I lamented about this a bit. The http_archive route is the more "pure Buck/Bazel" approach in my mind versus populating container images with lots of random things. But you'll probably need a mix of both.

Check out that whole example. Maybe the "Using Zig as a C++ compiler" cxx_zig_toolchain example should be a more up-front example for C/C++ users, since it can in theory bootstrap a C/C++ toolchain completely from scratch without the user needing to do anything. It won't work for everyone, it's a little strange to people who aren't familiar with it ("what is Zig, and why do I use it for C++ compilation?"), but it should act more like a one-shot example I think. And it should be hermetic, remote-execution safe, too.

andyie commented 1 year ago

Bazel avoids this missing dependency problem by symlinking each library's sources into its own temporary directory tree, with only symlinks to the headers of its dependencies present. Could Buck2 do a similar thing?

It's easy to omit dependencies by accident, so not enforcing them feels pretty scary. Even if Buck2's main use-case of remote builds side-steps this problem, running a local build server feels pretty extra for a local build-centric project.

thoughtpolice commented 1 year ago

See #105 and https://github.com/facebook/buck2/issues/105#issuecomment-1509307503 for some notes on what I think will probably happen in the short term. TL;DR writing a localhost-only server that provides the remote execution APIs and re-using the existing remote-codepath for the local case is probably the easiest short-term solution. It could also be useful for other systems like Pants or Bazel or anything needing it. Note that the remote execution API support is still in progress e.g. #222 so there's some work to be done there too.

The implementation specifics of how files are handled hermetically is basically an implementation detail — e.g. Windows 11 now has symlinks so you could do that there, but on Linux you could also use cgroups to enforce more purity in the process and filesystem hierarchy by being careful with mounts. So I think reusing the remote execution codepath actually has some benefits here by keeping these things separated. But at the end of the day, I think it's largely a matter of "draw the circle, then draw the owl". It needs to be designed and then implemented. Maybe using these APIs in the localhost case actually is overly pessimistic or complex, for example, and better performance could be had without all that stuff.

I unfortunately have not had time to implement anything extra into reapi-server as mentioned in #105. A production quality implementation will have to be designed up front a little; I have some ideas about efficient deduplicated storage and other notes. Feel free to help!

andyie commented 1 year ago

Gotcha, yeah, a built-in local version of a remote execution server sounds awesome! I suggested symlink trees because AFAICT Buck2 already uses them for headers, so I figured it would be straightforward to extend to sources.