Closed kapilsinha closed 7 months ago
Sure, tracking the timings here makes sense.
I profiled incremental builds (i.e. cargo [cmd], touch src/main.rs, cargo [cmd] again) in debate-map/app-server! Default compiler cargo check: 2.86 s ; My modded compiler cargo check: 2.33 s Default compiler cargo build: 6.88 s ; My modded compiler cargo build: 6.40 s I did a simple git clone and haven't messed with the code at all. There is a 17% improvement in cargo check time, but I'm not sure why you said an incremental cargo build (on the default compiler) is taking several minutes? FWIW I am running this locally on a Linux machine, sans Docker.
The slow increment cargo build happens after changing a single line of code here: https://github.com/debate-map/app/blob/8def743f6db26dbe3b495d018d7b452629baefd6/Packages/app-server/src/main.rs#L59
I simply add a "1" to the end of that log message, and that's when I see the slow incremental recompiles.
Here are my compile timings, when using cargo build
: (ie. debug build)
# first build (not 100% pristine since forgot to clear "target" folder beforehand)
[...]
Compiling rust-shared v0.0.1 (C:\Root\Apps\@V\DebateMap\Main\Packages\rust-shared)
Compiling app-server v0.1.0 (C:\Root\Apps\@V\DebateMap\Main\Packages\app-server)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 3m 48s
# recompile (no code change)
PS C:\Root\Apps\@V\DebateMap\Main\Packages\app-server> cargo build
Finished `dev` profile [unoptimized + debuginfo] target(s) in 5.80s
# recompile (no code change)
PS C:\Root\Apps\@V\DebateMap\Main\Packages\app-server> cargo build
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.94s
# recompile (after changing that one log line)
PS C:\Root\Apps\@V\DebateMap\Main\Packages\app-server> cargo build
Compiling app-server v0.1.0 (C:\Root\Apps\@V\DebateMap\Main\Packages\app-server)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 1m 02s
And here it is for cargo build --release
:
# first build (not 100% pristine since forgot to clear "target" folder beforehand)
[...]
Compiling dyn-clone v1.0.6
Compiling app-server v0.1.0 (C:\Root\Apps\@V\DebateMap\Main\Packages\app-server)
Finished `release` profile [optimized] target(s) in 7m 57s
# recompile (no code change)
PS C:\Root\Apps\@V\DebateMap\Main\Packages\app-server> cargo build --release
Finished `release` profile [optimized] target(s) in 1.48s
# recompile (no code change)
PS C:\Root\Apps\@V\DebateMap\Main\Packages\app-server> cargo build --release
Finished `release` profile [optimized] target(s) in 1.12s
# recompile (after changing that one log line)
PS C:\Root\Apps\@V\DebateMap\Main\Packages\app-server> cargo build --release
Compiling app-server v0.1.0 (C:\Root\Apps\@V\DebateMap\Main\Packages\app-server)
Finished `release` profile [optimized] target(s) in 4m 52s
While I normally do the builds in docker, the timings above are done on my host machine (OS: Windows 10, CPU: Ryzen 7 1700X, 8-Core). The rest of my timings in this thread (unless otherwise specified) will also be done on my host machine, for consistency -- and to rule out the possibility of a Docker misconfiguration.
Anyway, as seen from the timings above, incremental recompiles are really quite slow for having changed only a single logging line. So quite curious to see what results you have with your modified rust! (which for reference for other readers, adds caching of the expansions of proc-macros when the token-stream input for those macros has not changed since the last compile)
@Venryx My incremental builds are on the order of 14 seconds when I make that 1-line change. I'm happy to get you set up on a remote machine with my compiler, if you want to try it out. I just need your ssh pubkey for it. Feel free to email me at kapil@coderemote.dev
@Venryx My incremental builds are on the order of 14 seconds when I make that 1-line change. I'm happy to get you set up on a remote machine with my compiler, if you want to try it out. I just need your ssh pubkey for it. Feel free to email me at kapil@coderemote.dev
I'm guessing the 14 seconds is for an incremental debug build, with your fork of rust?
For completeness' sake, it'd be cool to also know: (with the one-line change in each case)
But in any case, 14 seconds sounds great! Will send you in an email in a bit.
I'll include a summary here of my cargo check/build timings (portions also included in the email I just sent).
Notes:
println
call in Packages/app-server/src/main.rs
. [EDIT: Actually, it seems that incremental compilation is disabled by default for release builds. So any change in the target crate, means that whole crate will/did get recompiled I believe. Enabling incremental compile for release builds is not recommended, but I'm still tempted by it since it cuts down release builds on my local machine from ~5m to ~3m10s.]On my local machine, the timings are:
cargo check
, stock rust: ~8scargo check --release
, stock rust: ~34scargo build
, stock rust: ~1m2s [~27s at next recompile; see EDIT2]cargo build --release
, stock rust: ~4m52sOn the CodeRemote build machine, the timings are:
cargo check
, stock rust: ~3scargo check
, modded rust: ~3scargo check --release
, stock rust: ~12scargo check --release
, modded rust: ~5scargo build
, stock rust: ~8scargo build
, modded rust: ~8scargo build --release
, stock rust: ~1m20scargo build --release
, modded rust: ~1m20sFirst observation: The CodeRemote build machine's CPU (Ryzen 5 3600) should only be +17% faster (cpu benchmark here) than my CPU (Ryzen 7 1700x), yet it somehow is +300% faster for compiling this particular project!
So I will need to investigate and see what the heck is causing such a big compile-time difference between CPUs with similar specs.
EDIT: This isn't relevant for evaluation of the CR modded Rust, but interesting to note that when I skipped all the "slow macros" (by enabling the custom "wrap_slow_macros" for builds as well), the timings on my local machine's incremental builds were "~20s" (debug) and "~1m2s" (release). So the slowness of my local machine (relative to the build machine) is not solved, but it's worth noting that most of the compile time is indeed "due to" the proc-macros; however, despite the modded Rust caching their expansion, the macros' existence still results in a huge increase in compile times. (maybe the additional code that they output simply results in LLVM taking way longer to complete, ie. the issue might not be at the layer of the Rust frontend compiler)
EDIT2: Actually, it appears it's not that clear-cut. If I do more than one "incremental compile" (as defined above), then the second one somehow executes faster than the first one (similar to what I've observed with cargo-check). After multiple incremental compiles, my local (debug) build can get down to ~27s, even without the "wrap_slow_macros" macro enabled. Still higher than expected based on the general CPU benchmarks (and doesn't exactly/definitely explain diff from build machine), but more believable as involving just a hardware explanation rather than an issue with the config/local-environment. Also: I really should have been doing my benchmarks with "cargo rustc -- -Ztime-passes", since that includes precious details; like how for my local machine builds, the "run_linker" system (at point of ~27s incr-builds) takes ~11s (since mold linker isn't used/available on Windows).
EDIT3: Using the cargo rustc --release -- -Ztime-passes
command (on local machine), I was able to find the section that is taking so long (see here for the full time-passes): 255.350; rss: 1838MB -> 322MB (-1516MB) LLVM_passes
. So for some reason, the LLVM step in particular completes very slowly (4m15s) on my local machine, even for these incremental (release) builds.
Summary: For this particular project, the modded compiler appears to be helpful for "cargo check --release", but not for the other check/build configs. This is surprising, since I thought my project was bottlenecked by the proc-macro expansions, for incremental release builds. However, I am grateful to have been prompted to do these tests, since I can see plainly now how slow the compiles are on my local machine for some reason, which I will now investigate.
(And thank you to @kapilsinha for taking the time to set up a test environment for me on his build server! It was nice to get to try a mod of the compiler with caching of proc-macro expansions, and I wish him the best in his continued development of it, given the substantial speedups it's shown in some other benchmarked projects.)
The benchmarkings you posted http://github.com/debate-map/app/issues/285#issuecomment-2020222609 are very interesting.
Is Code Remote building on Windows like yourself? Or on Linux?
Is Code Remote building on Windows like yourself? Or on Linux?
The Code Remote building is done on Linux. (and not within a Docker container or anything from what I could tell)
The Code Remote building is done on Linux. (and not within a Docker container or anything from what I could tell)
So you are comparing completely different things.
Try linux on windows subsystem with mold and tell us the results.
Okay, I have redone the tests on my local machine, except within WSL2 rather than on Windows: (and naturally, I used a fresh clone of my repo within the linux file-system, to avoid cross-OS filesystem overhead; also used mold as you requested)
cargo build
, stock rust: ~20s [compared to ~27s outside of WSL2]cargo build --release
, stock rust: ~3m01s [compared to ~4m52s outside of WSL2]So indeed, it seems that building in Linux rather than Windows does give a substantial speed improvement.
Though, still slower than expected relative to the CodeRemote build machine -- which should only be +17% faster (based on relative cpu power/generic-benchmarks), but is still much more than that:
cargo build
, stock rust: ~8scargo build --release
, stock rust: ~1m20sEDIT1: Anyone know why I'm getting substantially faster compiles within my WSL2 Linux? Why/how does Windows add so much overhead? (I already tried disabling my antivirus, with no timing change [since already added exception earlier], so that's not the issue. Maybe it is due to the WSL2 Linux system using ext4 for its file-system rather than NTFS, which I've heard is slower? Would be surprising to me if that's the only reason for that big of a speed diff, though...)
EDIT2: For completeness' sake, I also tried doing a cargo build --release
(after one-line change) in Docker (the dockerfiles run within WSL2; and I have a --mount=type=cache,target=/dm_repo/target
set up), and the timing was 3m24s
. Which is better than I remember, and only slightly higher than building outside of the Dockerfiles.
EDIT3: I found that using Cranelift (and incremental compilation), I was able to drastically improve the speed of my release compiles (3m24s+ to ~20s) -- of course, others' projects may not have the same bottlenecks. More info on it (along with a compile-times table) in my comment here: https://github.com/async-graphql/async-graphql/issues/1287#issuecomment-2096520167
Hi, Kapil here from CodeRemote. I wanted to continue our discussion from r/rust, and figured it would be easier to track here. Sounds like you're facing bad compile times here, and I'd like to see how I can help alleviate that