Running two top-level targets with overlapping third party Rust crates results in cascading build failures

nickgerace commented 1 year ago

Description

When running two top-level targets that have overlapping third party Rust crate dependencies with targets generated by reindeer, cascading build failures will occur.

Reproduction

This can be reproduced when they’re run at the same time in two different terminals. First, create two Rust binaries that depend on at least one of the same third party Rust crates (let's call them foo and bar). Then, vendor the third party rust crates with reindeer and then run each target in two different terminals.

One terminal:

buck2 run :foo

Another terminal:

buck2 run :bar

You will see multiple errors in each terminal.

Here's an example from the first terminal:

Local command returned non-zero exit code 1                                                                                                                       [14/1076]
Reproduce locally: `/usr/bin/env "PYTHONPATH=buck-out/v2/gen/prelude/213ed1b7ab869379/rust/tools/__rustc_action__/__rust ...<omitted>... t/213ed1b7ab869379/third-party/rus
t/__strsim-0.10.0__/rlib-pic-static_pic-link/strsim-link-diag.args (run `buck2 log what-failed` to get the full command)`
stdout:
stderr:
error: failed to create encoded metadata from file: No such file or directory (os error 2)

error: aborting due to previous error

Action failed: root//third-party/rust:adler-1.0.2 (rustc rlib-pic-static_pic-link/adler-link rlib,pic,link [diag])
Local command returned non-zero exit code 1
Reproduce locally: `/usr/bin/env "PYTHONPATH=buck-out/v2/gen/prelude/213ed1b7ab869379/rust/tools/__rustc_action__/__rust ...<omitted>... root/213ed1b7ab869379/third-party/
rust/__adler-1.0.2__/rlib-pic-static_pic-link/adler-link-diag.args (run `buck2 log what-failed` to get the full command)`
stdout:
stderr:
error: couldn't create a temp dir: No such file or directory (os error 2) at path "/Users/nick/src/si/buck-out/v2/gen/root/213ed1b7ab869379/third-party/rust/__adler-1.0.2_
_/rlib-pic-static_pic-link/extras/libadler/rmetaYIVsgh"

error: aborting due to previous error

Here's an example from the second terminal:

error: Failed to load argument file: IO Error: /Users/nick/src/si/buck-out/v2/tmp/root/213ed1b7ab869379/third-party/rust/__anstyle-1.0.0__/rustc/_buck_936af9e130ade2[0/47]
tc-args-b9fle1tv.txt: No such file or directory (os error 2)

Action failed: root//third-party/rust:hashbrown-0.12.3 (rustc rlib-pic-static_pic-link/hashbrown-link rlib,pic,link [diag])
Local command returned non-zero exit code 1
Reproduce locally: `/usr/bin/env "PYTHONPATH=buck-out/v2/gen/prelude/213ed1b7ab869379/rust/tools/__rustc_action__/__rust ...<omitted>... d1b7ab869379/third-party/rust/__ha
shbrown-0.12.3__/rlib-pic-static_pic-link/hashbrown-link-diag.args (run `buck2 log what-failed` to get the full command)`
stdout:
stderr:
error: could not write output to buck-out/v2/gen/root/213ed1b7ab869379/third-party/rust/__hashbrown-0.12.3__/rlib-pic-static_pic-link/extras/libhashbrown/hashbrown-174a03d
2.hashbrown.5a421c79-cgu.0.rcgu.o: No such file or directory

error: aborting due to previous error

Build ID: b8817a51-747b-43ab-8692-a1fa0b0d814e
Jobs completed: 2.0 K. Time elapsed: 3.6s. Cache hits: 0%. Commands: 82 (cached: 0, remote: 0, local: 82)
BUILD FAILED
Failed to build 'root//third-party/rust:hashbrown-0.12.3 (prelude//platforms:default#213ed1b7ab869379)'

Additional Thoughts

This appears to be OS/ARCH agnostic
I am wondering if DICE is working as intended here as I would expect the third party crate to be built once (wherever its seen in the graph first) and its output to be used for both binaries

As an aside, thanks for this great project!

davidbarsky commented 1 year ago

It's generally expected that you'd use isolation directories for concurrent builds. At least for LSPs at fb, isolation directories are used to prevent LSP-spawned processes from clobbering terminal-spawned process and vice-versa.

(Speaking only as a user of Buck2, not as a member of the team, I'd love to have more fine-grained locking inside of Buck2, but I understand that it might be very tricky to implement.)

adamhjk commented 1 year ago

I think this is a slightly different thing? The buck2 daemon for me seems to do this correctly - if I run 5 different run targets in quick succession, which have overlapping build requirements, the daemon knows that it should wait on another concurrent build to make progress. We see this behavior consistently for roughly half of our team.

For the other half, they consistently just get build failures (that, indeed, look like isolation issues.) But it feels like this is one of the primary use cases for buck2 - being able to have the build daemon and the more granular build graph handle this kind of optimization. Certainly for those of us who have it working, it is, as the kids say, super sweet. So the issue does feel somehow more fundamental.

adamhjk commented 1 year ago

@davidbarsky also, it makes sense that the LSP would want a separate isolation directory, because you don't want your editor to ever block another build. This is a slightly different problem. :)

nickgerace commented 1 year ago

Adding onto what @adamhjk said: in his terminal panes, there are logs containing waiting on another command task (or similar wording, cannot recall precisely). In my terminal panes, I don't see those logs at all, and each buck2 invocation appears to just continue as if they were building in isolation.

davidbarsky commented 1 year ago

@adamhjk Yup, it probably is: I ran this by Wendy—I don't think she's on GitHub—and she said that she suspects that "the buckconfigs for controlling concurrent command behavior is not enabled".

(I'm not sure how to enable it, as I unknowingly messaged her while she was on a road trip.)

adamhjk commented 1 year ago

Cool! we can dig around. I think it's related to this FAQ entry not quite doing what we expect:

https://buck2.build/docs/faq/common_issues/#are-multiple-concurrent-commands-supported

adamhjk commented 1 year ago

Doesn't seem to be a documented config option that turns this behavior on or off.

iguridi commented 1 year ago

I'm guessing you want this

In app/buck2_server/src/daemon/state.rs there's this code that should control it

        let nested_invocation_config = root_config
            .parse::<NestedInvocation>("buck2", "nested_invocation")?
            .unwrap_or(NestedInvocation::Run);

        let parallel_invocation_config = root_config
            .parse::<ParallelInvocation>("buck2", "parallel_invocation")?
            .unwrap_or(ParallelInvocation::Run);

Used in .buckconfig as

[buck2]
  /// other keys
  parallel_invocation = run | block | error
  nested_invocation = run | error

krallin commented 1 year ago

We really should flip the default for this to have the "correct" behavior. I'll send a diff for it.

adamhjk commented 1 year ago

On Mon, Jun 5, 2023 at 7:45 AM Thomas Orozco @.***> wrote:

We really should flip the default for this to have the "correct" behavior. I'll send a diff for it

Isn’t the default “run” tho, which is the correct behavior?

krallin commented 1 year ago

Isn’t the default “run” tho, which is the correct behavior?

The default is run, but to get correct behavior, you'd need:

[buck2]
  parallel_invocation = block
  nested_invocation = error
  dice_cleanup = block

nickgerace commented 1 year ago

Thank you @krallin! Haven't tried it yet, but I will soon.

facebook / buck2