Compile time (CodeRemote)

kapilsinha commented 7 months ago

Hi, Kapil here from CodeRemote. I wanted to continue our discussion from r/rust, and figured it would be easier to track here. Sounds like you're facing bad compile times here, and I'd like to see how I can help alleviate that

Venryx commented 7 months ago

Sure, tracking the timings here makes sense.

I profiled incremental builds (i.e. cargo [cmd], touch src/main.rs, cargo [cmd] again) in debate-map/app-server! Default compiler cargo check: 2.86 s ; My modded compiler cargo check: 2.33 s Default compiler cargo build: 6.88 s ; My modded compiler cargo build: 6.40 s I did a simple git clone and haven't messed with the code at all. There is a 17% improvement in cargo check time, but I'm not sure why you said an incremental cargo build (on the default compiler) is taking several minutes? FWIW I am running this locally on a Linux machine, sans Docker.

The slow increment cargo build happens after changing a single line of code here: https://github.com/debate-map/app/blob/8def743f6db26dbe3b495d018d7b452629baefd6/Packages/app-server/src/main.rs#L59

I simply add a "1" to the end of that log message, and that's when I see the slow incremental recompiles.

Here are my compile timings, when using cargo build: (ie. debug build)

# first build (not 100% pristine since forgot to clear "target" folder beforehand)
   [...]
   Compiling rust-shared v0.0.1 (C:\Root\Apps\@V\DebateMap\Main\Packages\rust-shared)
   Compiling app-server v0.1.0 (C:\Root\Apps\@V\DebateMap\Main\Packages\app-server)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 3m 48s

# recompile (no code change)
PS C:\Root\Apps\@V\DebateMap\Main\Packages\app-server> cargo build
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 5.80s

# recompile (no code change)
PS C:\Root\Apps\@V\DebateMap\Main\Packages\app-server> cargo build
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.94s

# recompile (after changing that one log line)
PS C:\Root\Apps\@V\DebateMap\Main\Packages\app-server> cargo build
   Compiling app-server v0.1.0 (C:\Root\Apps\@V\DebateMap\Main\Packages\app-server)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1m 02s

And here it is for cargo build --release:

# first build (not 100% pristine since forgot to clear "target" folder beforehand)
   [...]
   Compiling dyn-clone v1.0.6
   Compiling app-server v0.1.0 (C:\Root\Apps\@V\DebateMap\Main\Packages\app-server)
    Finished `release` profile [optimized] target(s) in 7m 57s

# recompile (no code change)
PS C:\Root\Apps\@V\DebateMap\Main\Packages\app-server> cargo build --release
    Finished `release` profile [optimized] target(s) in 1.48s

# recompile (no code change)
PS C:\Root\Apps\@V\DebateMap\Main\Packages\app-server> cargo build --release
    Finished `release` profile [optimized] target(s) in 1.12s

# recompile (after changing that one log line)
PS C:\Root\Apps\@V\DebateMap\Main\Packages\app-server> cargo build --release
   Compiling app-server v0.1.0 (C:\Root\Apps\@V\DebateMap\Main\Packages\app-server)
    Finished `release` profile [optimized] target(s) in 4m 52s

While I normally do the builds in docker, the timings above are done on my host machine (OS: Windows 10, CPU: Ryzen 7 1700X, 8-Core). The rest of my timings in this thread (unless otherwise specified) will also be done on my host machine, for consistency -- and to rule out the possibility of a Docker misconfiguration.

Anyway, as seen from the timings above, incremental recompiles are really quite slow for having changed only a single logging line. So quite curious to see what results you have with your modified rust! (which for reference for other readers, adds caching of the expansions of proc-macros when the token-stream input for those macros has not changed since the last compile)

kapilsinha commented 7 months ago

@Venryx My incremental builds are on the order of 14 seconds when I make that 1-line change. I'm happy to get you set up on a remote machine with my compiler, if you want to try it out. I just need your ssh pubkey for it. Feel free to email me at kapil@coderemote.dev

Venryx commented 7 months ago

@Venryx My incremental builds are on the order of 14 seconds when I make that 1-line change. I'm happy to get you set up on a remote machine with my compiler, if you want to try it out. I just need your ssh pubkey for it. Feel free to email me at kapil@coderemote.dev

I'm guessing the 14 seconds is for an incremental debug build, with your fork of rust?

For completeness' sake, it'd be cool to also know: (with the one-line change in each case)

Incremental debug build, on your machine, with stock rust.
Incremental release build, on your machine, with stock rust.
Incremental release build, on your machine, with your modified rust.

But in any case, 14 seconds sounds great! Will send you in an email in a bit.

Venryx commented 7 months ago

I'll include a summary here of my cargo check/build timings (portions also included in the email I just sent).

Notes:

All of these timings are the "incremental" build/check times, as executed after changing the log-message of the single println call in Packages/app-server/src/main.rs. [EDIT: Actually, it seems that incremental compilation is disabled by default for release builds. So any change in the target crate, means that whole crate will/did get recompiled I believe. Enabling incremental compile for release builds is not recommended, but I'm still tempted by it since it cuts down release builds on my local machine from ~5m to ~3m10s.]
Naturally, my custom "wrap_slow_macros" macro is disabled for all of these benchmarkings. (the macro technically gets called, but just immediately returns its input tokens) [to confirm that its "disabled form" has no impact, I tried the CR->build->release->modded config with these proc-macros commented, and got the same timings]
All of my local machine timings (well actually, all the timings below in general) are not done within Docker or anything. (mentioned since my normal build process for debate-map occurs within Docker; for these timings of course, I wanted to remove the Docker variable from the benchmarking process)
For most of the timings below (excluding eg. my local timings, though some of those are in my comment above), you can see the raw logs for the attempts here: https://github.com/debate-map/misc/blob/main/RustCompileBenchmarks/2024-03-26/Main.txt (some extra files can be seen here)

On my local machine, the timings are:

My local machine, cargo check, stock rust: ~8s
My local machine, cargo check --release, stock rust: ~34s
My local machine, cargo build, stock rust: ~1m2s [~27s at next recompile; see EDIT2]
My local machine, cargo build --release, stock rust: ~4m52s

On the CodeRemote build machine, the timings are:

CR build machine, cargo check, stock rust: ~3s
CR build machine, cargo check, modded rust: ~3s
CR build machine, cargo check --release, stock rust: ~12s
CR build machine, cargo check --release, modded rust: ~5s
CR build machine, cargo build, stock rust: ~8s
CR build machine, cargo build, modded rust: ~8s
CR build machine, cargo build --release, stock rust: ~1m20s
CR build machine, cargo build --release, modded rust: ~1m20s

First observation: The CodeRemote build machine's CPU (Ryzen 5 3600) should only be +17% faster (cpu benchmark here) than my CPU (Ryzen 7 1700x), yet it somehow is +300% faster for compiling this particular project!

So I will need to investigate and see what the heck is causing such a big compile-time difference between CPUs with similar specs.

EDIT: This isn't relevant for evaluation of the CR modded Rust, but interesting to note that when I skipped all the "slow macros" (by enabling the custom "wrap_slow_macros" for builds as well), the timings on my local machine's incremental builds were "~20s" (debug) and "~1m2s" (release). So the slowness of my local machine (relative to the build machine) is not solved, but it's worth noting that most of the compile time is indeed "due to" the proc-macros; however, despite the modded Rust caching their expansion, the macros' existence still results in a huge increase in compile times. (maybe the additional code that they output simply results in LLVM taking way longer to complete, ie. the issue might not be at the layer of the Rust frontend compiler)

EDIT2: Actually, it appears it's not that clear-cut. If I do more than one "incremental compile" (as defined above), then the second one somehow executes faster than the first one (similar to what I've observed with cargo-check). After multiple incremental compiles, my local (debug) build can get down to ~27s, even without the "wrap_slow_macros" macro enabled. Still higher than expected based on the general CPU benchmarks (and doesn't exactly/definitely explain diff from build machine), but more believable as involving just a hardware explanation rather than an issue with the config/local-environment. Also: I really should have been doing my benchmarks with "cargo rustc -- -Ztime-passes", since that includes precious details; like how for my local machine builds, the "run_linker" system (at point of ~27s incr-builds) takes ~11s (since mold linker isn't used/available on Windows).

EDIT3: Using the cargo rustc --release -- -Ztime-passes command (on local machine), I was able to find the section that is taking so long (see here for the full time-passes): 255.350; rss: 1838MB -> 322MB (-1516MB) LLVM_passes. So for some reason, the LLVM step in particular completes very slowly (4m15s) on my local machine, even for these incremental (release) builds.

Summary: For this particular project, the modded compiler appears to be helpful for "cargo check --release", but not for the other check/build configs. This is surprising, since I thought my project was bottlenecked by the proc-macro expansions, for incremental release builds. However, I am grateful to have been prompted to do these tests, since I can see plainly now how slow the compiles are on my local machine for some reason, which I will now investigate.

(And thank you to @kapilsinha for taking the time to set up a test environment for me on his build server! It was nice to get to try a mod of the compiler with caching of proc-macro expansions, and I wish him the best in his continued development of it, given the substantial speedups it's shown in some other benchmarked projects.)

frederikhors commented 7 months ago

The benchmarkings you posted http://github.com/debate-map/app/issues/285#issuecomment-2020222609 are very interesting.

Is Code Remote building on Windows like yourself? Or on Linux?

Venryx commented 7 months ago

Is Code Remote building on Windows like yourself? Or on Linux?

The Code Remote building is done on Linux. (and not within a Docker container or anything from what I could tell)

frederikhors commented 7 months ago

The Code Remote building is done on Linux. (and not within a Docker container or anything from what I could tell)

So you are comparing completely different things.

Try linux on windows subsystem with mold and tell us the results.

Venryx commented 7 months ago

Okay, I have redone the tests on my local machine, except within WSL2 rather than on Windows: (and naturally, I used a fresh clone of my repo within the linux file-system, to avoid cross-OS filesystem overhead; also used mold as you requested)

My local machine, WSL2, cargo build, stock rust: ~20s [compared to ~27s outside of WSL2]
My local machine, WSL2, cargo build --release, stock rust: ~3m01s [compared to ~4m52s outside of WSL2]

So indeed, it seems that building in Linux rather than Windows does give a substantial speed improvement.

Though, still slower than expected relative to the CodeRemote build machine -- which should only be +17% faster (based on relative cpu power/generic-benchmarks), but is still much more than that:

CR build machine, cargo build, stock rust: ~8s
CR build machine, cargo build --release, stock rust: ~1m20s

EDIT1: Anyone know why I'm getting substantially faster compiles within my WSL2 Linux? Why/how does Windows add so much overhead? (I already tried disabling my antivirus, with no timing change [since already added exception earlier], so that's not the issue. Maybe it is due to the WSL2 Linux system using ext4 for its file-system rather than NTFS, which I've heard is slower? Would be surprising to me if that's the only reason for that big of a speed diff, though...)

EDIT2: For completeness' sake, I also tried doing a cargo build --release (after one-line change) in Docker (the dockerfiles run within WSL2; and I have a --mount=type=cache,target=/dm_repo/target set up), and the timing was 3m24s. Which is better than I remember, and only slightly higher than building outside of the Dockerfiles.

EDIT3: I found that using Cranelift (and incremental compilation), I was able to drastically improve the speed of my release compiles (3m24s+ to ~20s) -- of course, others' projects may not have the same bottlenecks. More info on it (along with a compile-times table) in my comment here: https://github.com/async-graphql/async-graphql/issues/1287#issuecomment-2096520167

debate-map / app

Compile time (CodeRemote) #285