Buck2 performance on single-file update in erlang project

srstrong commented 4 months ago

Hi,

We're exploring porting our current build system from a mix of make/bazel/cargo/etc to buck2. First task was to just get our current erlang code building, which was delightfully easy. In particular, I like the automatic dependency checking that the erlang_application rule is doing, since that's something we are having to do ourselves as a pre-step before our current Bazel build. However, I'm seeing a big performance issue if I touch a single erlang file. It's a pretty big repo - around 3600 erlang source files with around 550k lines of code, but touching a single file takes around 30 seconds for the buck2 build to complete.

Running the build with no changes is almost instant.

Is this anything that anyone else has seen, or do I just have an abnormally large project? Any pointers you can give would be much appreciated.

JakobDegen commented 4 months ago

So, with unexpectedly slow builds there's basically two possibilities: 1. Something got rebuilt which you weren't expecting to get rebuilt, or 2. something that you were expecting to get rebuilt took longer than you thought it would. You can probably figure out which scenario you were in by running buck2 log critical-path after your command - this will tell you what buck2 was waiting on. If you want the exact commands that were executed, some grepping through buck2 log what-ran should give you that too.

If you can get similar information from another build system, it might be worth comparing the two to try and identify the cause of the relative slowdown.

Other than that, this is likely to require erlang specific expertise, which I have none of to offer. Will try and shop this around to some folks who might

srstrong commented 4 months ago

Thanks for getting back to me. I've just done a build with a single dirty file, and this gist has the console output from those commands - https://gist.github.com/srstrong/5ecd86f20dfd8ceb20c39ece695931c9

As you can see, the build itself took over 30 seconds, and in terms of what ran, it was just 4 commands (all of which I'd expect, and none take any particular time). I don't know what the numbers on the critical-path output represent, but hopefully they might mean something to you :)

JakobDegen commented 4 months ago

Yeah, the output format of the critical path command could use improving.

The important thing is the first number that you see, which is how long that step took in microseconds. I'm a little bit suspicous of what's going on there, since we're only reporting a 15 second critical path for a 30 second command. If you do a similar build, what does buck say it's doing in superconsole during that time (buck2 log replay helps for this)?

An additional thing that you might do to help debug is manually run all the commands the whatran says were executed and see how long they took

srstrong commented 4 months ago

Running the 4 commands gave timings of:

dependency_analyser: 0.173s app_resource: 0.189s dependency_finalizer: 0.293s erlc: 0.219s

so nothing substantial there. Console output looks something like:

[nix-shell:~/dev/norsk]$ buck2 build --console simpletty //:norsk-app
[2024-02-11T20:08:00.348+00:00] Build ID: 9cd38900-9529-4c94-a8cc-f7b9c3e8595d
[2024-02-11T20:08:00.348+00:00] File changed: root//server/output/Workflow.Metrics.StandardInstruments/workflow_metrics_standardInstruments@foreign.erl
[2024-02-11T20:08:07.443+00:00] Waiting on buck2 daemon 9cd38900-9529-4c94-a8cc-f7b9c3e8595d...
[2024-02-11T20:08:07.443+00:00] Resource usage: CPU: 948%
[2024-02-11T20:08:07.443+00:00] IO: none
[2024-02-11T20:08:14.443+00:00] Waiting on buck2 daemon 9cd38900-9529-4c94-a8cc-f7b9c3e8595d...
[2024-02-11T20:08:14.443+00:00] Resource usage: <snapshot is stale>
[2024-02-11T20:08:21.543+00:00] Waiting on root//:norsk-app (prelude//platforms:default#213ed1b7ab869379) -- action (app_resource norsk-app.app(erlang-local)) [local_execute], and 1 other actions
[2024-02-11T20:08:28.543+00:00] Waiting on root//:norsk-app (prelude//platforms:default#213ed1b7ab869379) -- action (app_resource norsk-app.app(erlang-local)) [local_execute], and 1 other actions
[2024-02-11T20:08:35.643+00:00] Waiting on root//:norsk-app (prelude//platforms:default#213ed1b7ab869379) -- action (erlc workflow_metrics_standardInstruments@foreign.erl(erlang-local)) [local_execute]
[2024-02-11T20:08:36.554+00:00] Cache hits: 0%
[2024-02-11T20:08:36.554+00:00] Commands: 4 (cached: 0, remote: 0, local: 4)
BUILD SUCCEEDED

I also ran with --ui dice and this was the state at around 30s - those counters for BuildKey etc had been incrementing over the course of the 30s

[nix-shell:~/dev/norsk]$ buck2 build --ui dice //:norsk-app
File changed: root//server/output/Workflow.Metrics.StandardInstruments/workflow_metrics_standardInstruments@foreign.erl
Build ID: 11ce9cce-4afc-4508-8206-c57b91cd1943
Dice Key States
    Key                                       ChkDeps  Pending    Done
----------------------------------------------------------------------
    BuildKey                                 |  5.0K  |    2   |    0
    DeferredCompute                          |  4.7K  |    0   |    0
    DeferredResolve                          |  4.7K  |    0   |    0
    PathMetadataKey                          |    0   |    0   |    1
----------------------------------------------------------------------
Command: build.                                                                                                                                     Remaining: 5.0K/5.0K. Time elapsed: 30.7s
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

after completion, this was the final state:

[nix-shell:~/dev/norsk]$ buck2 build --ui dice //:norsk-app
File changed: root//server/output/Workflow.Metrics.StandardInstruments/workflow_metrics_standardInstruments@foreign.erl
Build ID: 17d51a6d-5a95-4858-b800-a1d4beb111f4
Dice Key States
    Key                                       ChkDeps  Pending    Done
----------------------------------------------------------------------
    BuildKey                                 |    2   |    2   |    3
    DeferredCompute                          |    3   |    0   |    0
    DeferredResolve                          |    3   |    0   |    0
    PathMetadataKey                          |    0   |    0   |    1
----------------------------------------------------------------------
Jobs completed: 18. Time elapsed: 35.4s.
Cache hits: 0%. Commands: 4 (cached: 0, remote: 0, local: 4)
BUILD SUCCEEDED

srstrong commented 4 months ago

Final update from tonight - I just generated a BUCK file with around 1700 targets, one for each erlang source file (this is pretty much how we do things in our current bezel system). Using that and my own rule for compiling the erlang, a build after a single-file change completes on 0.2 seconds, which is much more like it! I wonder if it's some interaction with the dynamic_output stuff that the provided erlang_application rule is using? I've not looked at the buck2 code at all, so this is purely a guess :)

TheGeorge commented 4 months ago

Hi, I have context on the Erlang rules and I am trying to understand what's so slow here. Can you share some basic stats? How many applications, and how many modules per applications does your project approximately have?

I just generated a BUCK file with around 1700 targets

I think this will always be the fastest way to produce builds, since you are giving explicit dependencies for each module to buck2. For the Erlang rules we built, we aimed to make module dependencies discovered automatically based on the application dependencies, which creates some overhead. But 30s for a single file change seems excessive. I tried to reproduce this with disabled caches on local builds on a larger code-base, but am pretty far from 30s.

srstrong commented 4 months ago

The structure from an erlang perspective is probably a little unusual - our primary development language is Purescript, using purerl to compile the source purescript into erlang, and then onto a regular erlang release. In the erlang world, post the purescript step, this turns into a single application with some 1700 erlang modules - probably larger than the average Erlang application!

In our current build system, we are using bazel and we have a pre-build step that generates a bazel BUILD file with a rule for each of the erlang modules, with their dependencies (code in that generator being very similar to what's present in the BUCK erlang rules, doing a parse of the source to detect includes, parse transforms etc). That works ok, but bazel itself is not the fastest thing so we sit at about 8-10s for a single-file rebuild.

Was hoping with buck and the dynamic-outputs to avoid this extra step - and the erlang rules do pretty much exactly what we want. My current buck2 test is just a single BUCK file with a single erlang-application, where the sources are just a glob into the purescript compiler's output directory. Note that all of this is post the purescript compiler, so other than the context as to why we have such a large single application, there's nothing purescript-specific going on.

If there are any specific tests or stats that you'd like, I'd be happy to share

TheGeorge commented 4 months ago

Thanks for the context, let me try to investigate on our side what could cause the slowdown, and come back to you.

srstrong commented 4 months ago

On the off chance that it helps, I have made a very simple repro outside of our main project. I have a BUCK file:

erlang_application (
    name = "test-app",
    srcs = glob(["output/**/*.erl", "output/**/*.hrl"]),
    applications = [
        "kernel",
        "stdlib",
    ],
    build_edoc_chunks = False,
    erl_opts = [],
)

and a gen.sh:

#!/bin/bash

rm -rf output
mkdir -p output

for i in {1..2000}; do
  cat << EOF > "output/module_${i}.erl"
-module(module_$i).
-export([foo/0]).

-include("module_$i.hrl").

foo() ->
  ok.  
EOF

  cat << EOF > "output/module_${i}.hrl"
-define(A, 1).
EOF
done

assuming presence of prelude, third-party, toolchain etc, then this replicates exactly what I'm seeing on our main project. Initial build takes some time, but don't much care about that. But then a single file change takes in excess of 30s on my machine.

If I drop gen.sh to generate 1000 files instead of 2000 (and do buck2 clean), then a single-file rebuild is taking 9.3s. 500 files and it drops to 2.4s and at 250 files it's down to 0.8s....

JakobDegen commented 4 months ago

Ah, I figured it out. This slowness was the result of an accidentally introduced lock in some core code. I've submitted a change internally and verified that using your repro, the rebuild time goes from 26s to 6s. I'll link the commit with a full description of the issue here once it lands

Thank you for reporting this and sharing the repro! It's quite possible that you're not the first one to run into it and that it's actually been causing unnecessary slowness for many users

JakobDegen commented 4 months ago

Actually, I have a more useful recommendation for you. Try setting

[buck2]
  dice = modern

In your root .buckconfig. DICE is our core incrementality engine, and this will switch you to a newer implementation (which will eventually become the default). That avoids this issue entirely.

The build is still a bit slower than I'd like, but at least now it's sort of reasonable at 5 seconds instead of being completely pathological

srstrong commented 4 months ago

Oh wow, that's just awesome. Updated my .buckconfig and my single-file rebuild on my main project is now 2s, that's very much a result! Thanks for looking into this, much appreciated

JakobDegen commented 4 months ago

Explanation of what went wrong is in the above commit. It may be the case that a master rebuild would allow you to not have to opt into modern dice anymore

facebook / buck2

Buck2 performance on single-file update in erlang project #555