Open krisnova opened 1 year ago
Was able to get a 30 minute build.. ouch
naive attempt:
$ rustup default nightly
$ export RUSTFLAGS=-Zself-profile
$ make all
unfortunately this build fails due to tonic generating code that fails the nightly clippy checks. adding an #![allow(clippy::all)]
to the "./proto/src/lib.rs" file helps get past that.
aer
falls foul of some other clippy checks that would need judicious allow
statements. in the mean time i'll see if i can get results just for auraed
.
$ cd auraed && cargo rustc --bin auraed -- -Zself-profile
works but only for the final crate. let's go deeper:
$ RUSTFLAGS="-Zself-profile=$(pwd)/profiles -Zself-profile-events=default,args" cargo build --bin auraed
$ crox --minimum-duration 10 --dir $(pwd)/profiles
oh look. a 2.4 GB JSON file for the chrome profiler to play with. which it rightly refused to load.
increasing the minimum duration to 3000
(3ms) helps. now it's only 23MB.
summary: auraed
takes about 5.5mins to build clean with all but 47secs of that being dependencies. worst offenders are tonic, netlink_proto, axum, and protobuf. ie, nothing surprising.
so the slowness is likely that we have:
each of which define their own set of dependencies that need checking, maybe downloading, and maybe building. if each takes 2mins, we're already over 30mins.
naively i'd expect that if we merge these into a single crate we'd end up spending less time dealing with dependencies but having less flexibility to build things individually.
naively i'd expect that if we merge these into a single crate we'd end up spending less time dealing with dependencies but having less flexibility to build things individually.
The guidance for Rust is usually to break projects up into small crates since crates that don't depend on each other can be built in parallel.
By looking at an action that missed the cache and picking a single dependency (I'm picking serde
under the assumption that it is a common dependency; it appears 44x in our Cargo.lock), we can get some hints on how Cargo reuses its output.
The following log lines appear for serde:
cargo build --target x86_64-unknown-linux-musl -p auraed
Downloaded serde v1.0.152
Compiling serde v1.0.152
...
cargo build -p auraescript
Compiling serde v1.0.152
...
cargo build -p aer
Compiling serde v1.0.152
...
cargo clippy --all-features --workspace --exclude auraed --exclude auraescript --exclude aer -- -D clippy::all -D warnings
Checking serde v1.0.152
...
cargo clippy --target x86_64-unknown-linux-musl -p auraed --all-features -- -D clippy::all -D warnings
Checking serde v1.0.152
...
cargo clippy -p auraescript --all-features -- -D clippy::all -D warnings
Checking serde v1.0.152
...
cargo clippy -p aer --all-features -- -D clippy::all -D warnings
# serde does not get checked even though `aer` has a dependency on it
...
cargo install --target x86_64-unknown-linux-musl --path ./auraed --debug --force
Downloaded serde v1.0.158
Compiling serde v1.0.158
...
cargo install --path ./auraescript --debug --force
Compiling serde v1.0.158
Notes:
cargo build
does a single download regardless of targetcargo install
won't use the download from cargo build
so it downloads againcargo build -p WHATEVER
will recompile the dependencycargo install -p WHATEVER
will recompile the dependencycargo clippy
sometimes rechecks dependencies. My only guess is that it only caches the last target used since it recheck as we changed from default target to explicit x86 and then back to default, but didn't recheck when aer ran which would have been a second default target check in a row (auraescript check before it)It looks like we can eliminate two cargo clippy
calls by altering the other:
cargo clippy -p auraescript --all-features -- -D clippy::all -D warnings
(1m 02s)cargo clippy -p aer --all-features -- -D clippy::all -D warnings
(30.56s)cargo clippy --all-features --workspace --exclude auraed --exclude auraescript --exclude aer -- -D clippy::all -D warnings
to cargo clippy --all-features --workspace --exclude auraed -- -D clippy::all -D warnings
(should be less than an additional 1m 30s since there will be shared dependencies)It is also possible that we can add cargo build --workspace --exclude auraed
and eliminate the auraescript/aer individual builds.
TL;DR: We should discuss why, what, and when we run certain CI commands. More to the point, here are some potential places we can improve the build times:
A script for replicating the analysis from this post can be found at the bottom.
From my initial investigation, it seems that the most timing-consuming operations are dependency management (particularly on fresh builds, as you might expect), the tests, and linking.
During this investigation, it became apparent that I didn't understand cargo's caching mechanism and future-highway's comment confirmed it. I couldn't figure out how to handle the cargo build cache across the aurae workspace, so I'll be using sccache.
This analysis was performed on a qemu Ubuntu VM with 4 vCPUs from a i9-9980HK and 16GBs of memory allocated to the VM.
Results from cargo test --workspace --timings
after a cargo clean
(showing 30s+ units):
Results from cargo test --workspace --timings
with cached dependencies (showing 30s+ units):
Results from RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=lld -Cincremental=false" cargo test --workspace --timings
with cached dependencies (more on linkers in a moment, showing 10s+ units):
Linking time with the default linker:
❯ rg '(link|total)' aer-default.txt
173:time: 0.158; rss: 173MB -> 173MB ( +0MB) link_binary_check_files_are_writeable
174:time: 58.611; rss: 173MB -> 173MB ( +0MB) run_linker
175:time: 0.162; rss: 173MB -> 173MB ( +0MB) link_binary_remove_temps
176:time: 58.932; rss: 173MB -> 173MB ( +0MB) link_binary
177:time: 58.937; rss: 173MB -> 114MB ( -59MB) link_crate
178:time: 59.534; rss: 184MB -> 114MB ( -70MB) link
179:time: 63.064; rss: 29MB -> 97MB ( +68MB) total
❯ rg '(link|total)' auraed-default.txt
103: Compiling netlink-packet-utils v0.5.1
112: Compiling netlink-packet-core v0.4.2
130: Compiling netlink-sys v0.8.4
144: Compiling netlink-proto v0.10.0
151: Compiling netlink-packet-route v0.13.0
166: Compiling rtnetlink v0.11.0
220:time: 0.224; rss: 298MB -> 298MB ( +0MB) link_binary_check_files_are_writeable
221:time: 152.282; rss: 298MB -> 295MB ( -4MB) run_linker
222:time: 0.298; rss: 136MB -> 136MB ( +0MB) link_binary_remove_temps
223:time: 152.823; rss: 298MB -> 136MB ( -162MB) link_binary
224:time: 152.823; rss: 298MB -> 136MB ( -162MB) link_crate
225:time: 153.706; rss: 338MB -> 136MB ( -202MB) link
226:time: 165.519; rss: 29MB -> 104MB ( +76MB) total
❯ rg '(link|total)' auraescript-default.txt
201:time: 0.058; rss: 155MB -> 155MB ( +0MB) link_binary_check_files_are_writeable
202:time: 170.649; rss: 155MB -> 155MB ( +1MB) run_linker
203:time: 0.071; rss: 146MB -> 146MB ( +0MB) link_binary_remove_temps
204:time: 170.781; rss: 155MB -> 146MB ( -9MB) link_binary
205:time: 170.781; rss: 155MB -> 146MB ( -9MB) link_crate
206:time: 171.001; rss: 155MB -> 146MB ( -9MB) link
207:time: 173.885; rss: 29MB -> 96MB ( +68MB) total
Linking time with lld
:
❯ rg '(link|total)' aer-lld.txt
171:time: 0.169; rss: 143MB -> 143MB ( +0MB) link_binary_check_files_are_writeable
172:time: 9.499; rss: 143MB -> 143MB ( +0MB) run_linker
173:time: 0.173; rss: 143MB -> 143MB ( +0MB) link_binary_remove_temps
174:time: 9.844; rss: 143MB -> 143MB ( +0MB) link_binary
175:time: 9.844; rss: 143MB -> 143MB ( +0MB) link_crate
176:time: 10.830; rss: 143MB -> 143MB ( +0MB) link
177:time: 13.903; rss: 29MB -> 102MB ( +73MB) total
❯ rg '(link|total)' auraed-lld.txt
104: Compiling netlink-packet-utils v0.5.1
110: Compiling netlink-packet-core v0.4.2
131: Compiling netlink-sys v0.8.4
147: Compiling netlink-proto v0.10.0
152: Compiling netlink-packet-route v0.13.0
165: Compiling rtnetlink v0.11.0
219:time: 0.270; rss: 216MB -> 216MB ( +0MB) link_binary_check_files_are_writeable
220:time: 19.198; rss: 216MB -> 216MB ( +0MB) run_linker
221:time: 0.290; rss: 124MB -> 124MB ( +0MB) link_binary_remove_temps
222:time: 19.767; rss: 216MB -> 124MB ( -92MB) link_binary
223:time: 19.767; rss: 216MB -> 124MB ( -92MB) link_crate
224:time: 21.418; rss: 281MB -> 124MB ( -157MB) link
225:time: 27.115; rss: 29MB -> 95MB ( +67MB) total
❯ rg '(link|total)' auraescript-lld.txt
200:time: 0.053; rss: 135MB -> 135MB ( +0MB) link_binary_check_files_are_writeable
201:time: 16.251; rss: 135MB -> 135MB ( +1MB) run_linker
202:time: 0.052; rss: 129MB -> 129MB ( +0MB) link_binary_remove_temps
203:time: 16.358; rss: 135MB -> 129MB ( -6MB) link_binary
204:time: 16.358; rss: 135MB -> 129MB ( -6MB) link_crate
205:time: 16.706; rss: 135MB -> 129MB ( -6MB) link
206:time: 19.145; rss: 29MB -> 95MB ( +67MB) total
Linking time with mold
:
❯ rg '(link|total)' aer-mold.txt
167:time: 0.173; rss: 143MB -> 143MB ( +0MB) link_binary_check_files_are_writeable
168:time: 9.127; rss: 143MB -> 143MB ( +0MB) run_linker
169:time: 0.206; rss: 113MB -> 113MB ( +0MB) link_binary_remove_temps
170:time: 9.510; rss: 143MB -> 113MB ( -30MB) link_binary
171:time: 9.510; rss: 143MB -> 113MB ( -30MB) link_crate
172:time: 10.497; rss: 143MB -> 113MB ( -30MB) link
173:time: 13.286; rss: 29MB -> 102MB ( +74MB) total
❯ rg '(link|total)' auraed-mold.txt
104: Compiling netlink-packet-utils v0.5.1
108: Compiling netlink-packet-core v0.4.2
130: Compiling netlink-sys v0.8.4
146: Compiling netlink-proto v0.10.0
151: Compiling netlink-packet-route v0.13.0
162: Compiling rtnetlink v0.11.0
219:time: 0.275; rss: 213MB -> 213MB ( +0MB) link_binary_check_files_are_writeable
220:time: 19.409; rss: 210MB -> 210MB ( +0MB) run_linker
221:time: 0.339; rss: 120MB -> 120MB ( +0MB) link_binary_remove_temps
222:time: 20.033; rss: 213MB -> 120MB ( -93MB) link_binary
223:time: 20.033; rss: 213MB -> 120MB ( -93MB) link_crate
224:time: 21.605; rss: 279MB -> 120MB ( -159MB) link
225:time: 26.949; rss: 29MB -> 95MB ( +66MB) total
❯ rg '(link|total)' auraescript-mold.txt
201:time: 0.054; rss: 135MB -> 135MB ( +0MB) link_binary_check_files_are_writeable
202:time: 18.446; rss: 135MB -> 135MB ( +1MB) run_linker
203:time: 0.064; rss: 135MB -> 135MB ( +0MB) link_binary_remove_temps
204:time: 18.565; rss: 135MB -> 135MB ( +1MB) link_binary
205:time: 18.565; rss: 135MB -> 135MB ( +1MB) link_crate
206:time: 18.898; rss: 135MB -> 135MB ( +1MB) link
207:time: 21.611; rss: 29MB -> 96MB ( +67MB) total
An important note regarding linkers is that cargo will create a binary for each integration test file, which requires the whole linking rigamarole, for each test file. This is why I included the lld
in the above cargo test timings. One way to circumvent a large portion of this problem is to use a single test binary, i.e., put all the tests in another directory.
Another important note is that I couldn't get mold
to work properly with the integration tests. Needs further investigation.
As I mentioned in the beginning, I don't quite understand how cargo caches the dependencies across a workspace, but at this point, I suspect it doesn't. We might be able to figure out a better keying method for our current caching mechanism in our CI, or swap to sccache after better understanding how all these caches work.
This isn't in the script below.
Here's the timing after implementing the caching with sccache, linking via lld, and combining the integration tests (showing all 5s+ units):
Total time: 201.1s (3m 21.1s)
A great constructive exercise we can do as part of this ticket is to discuss how we're using CI. I was watching Nova's stream when this ticket was made, and it seemed like the root problem was that Nova couldn't get a cargo check
error in a timely manner. There's something to be said for having fast compilation times generally, but pragmatically, what exactly do we want from our CI, why do we want those things, and when do we want them?
cargo check
is a great example to kick off this conversation:
cargo check
cargo check
errors are in the main branch, something has gone wrong. In any case, a complete build will catch it.Assumes you have followed the aurae build instructions, have export RUSTC_WRAPPER=sccache
in the shell you're running this script in, and have the following installed:
sccache
clang
lld
mold
progs=(
"aer"
"auraed"
"auraescript"
)
linkers=(
"lld"
"mold"
)
echo "INFO: Running analysis"
mkdir -p profiling
echo "INFO: Cleaning"
make clean-gen clean-crates
echo "INFO: Finished cleaning"
echo "INFO: First run of cargo test --workspace --timings (no cache)"
cargo test --workspace --timings
mv target/cargo-timings/cargo-timing.html profiling/cargo-timing-no-cache.html
echo "INFO: Done with first run of cargo test"
echo "INFO: Second run of cargo test --workspace --timings (using cache)"
cargo test --workspace --timings
mv target/cargo-timings/cargo-timing.html profiling/cargo-timing-with-cache.html
echo "INFO: Done with second run of cargo test"
echo "INFO: Third run of cargo test --workspace --timings (using lld and cache)"
RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=lld" cargo test --workspace --timings
mv target/cargo-timings/cargo-timing.html profiling/cargo-timing-lld-with-cache.html
echo "INFO: Done with third run of cargo test"
for prog in "${progs[@]}"; do
echo "INFO: Running $prog + default"
cargo +nightly rustc -p "$prog" --bins -- -Z time-passes &> "profiling/$prog-default.txt"
echo "INFO: Done with $prog + default"
done
for linker in "${linkers[@]}"; do
for prog in "${progs[@]}"; do
echo "INFO: Running $prog + $linker"
RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=$linker" cargo +nightly rustc -p "$prog" --bins -- -Z time-passes &> "profiling/$prog-$linker.txt"
echo "INFO: Done with $prog + $linker"
done
done
echo "INFO: Results for timings with no cache can be found at $(pwd)/profiling/cargo-timing-no-cache.html"
echo "INFO: Results for timings with cache can be found at $(pwd)/profiling/cargo-timing-with-cache.html"
for prog in "${progs[@]}"; do
echo "INFO: Results for $prog + default time-passes can be found at $(pwd)/profiling/$prog-default.txt"
done
for linker in "${linkers[@]}"; do
for prog in "${progs[@]}"; do
echo "INFO: Results for $prog + $linker time-passes can be found at $(pwd)/profiling/$prog-$linker.txt"
done
done
echo "INFO: View in browser via file://$(pwd)/profiling"
my builds are still slow, but i'm not sure what we can do. maybe we should think about breaking out the monorepo...
I think there is opportunity for newcomers to the project to help with our current build times.
Right now our current builds can take up to 15 minutes, and I believe there is a lot of repeated and duplicated work. The more compiling and downloading and linking commands we do, the longer the builds.
How do we get build times down to just a few seconds? Maybe less than 60 seconds for the entire project?