aurae-runtime / aurae

Distributed systems runtime daemon written in Rust.
https://aurae.io
Apache License 2.0
1.85k stars 90 forks source link

Inefficient Builds (15 mins) #461

Open krisnova opened 1 year ago

krisnova commented 1 year ago

I think there is opportunity for newcomers to the project to help with our current build times.

Right now our current builds can take up to 15 minutes, and I believe there is a lot of repeated and duplicated work. The more compiling and downloading and linking commands we do, the longer the builds.

How do we get build times down to just a few seconds? Maybe less than 60 seconds for the entire project?

image

krisnova commented 1 year ago

Was able to get a 30 minute build.. ouch

dmah42 commented 1 year ago

naive attempt:

$ rustup default nightly
$ export RUSTFLAGS=-Zself-profile
$ make all

unfortunately this build fails due to tonic generating code that fails the nightly clippy checks. adding an #![allow(clippy::all)] to the "./proto/src/lib.rs" file helps get past that.

aer falls foul of some other clippy checks that would need judicious allow statements. in the mean time i'll see if i can get results just for auraed.

$ cd auraed &&  cargo rustc --bin auraed -- -Zself-profile

works but only for the final crate. let's go deeper:

$ RUSTFLAGS="-Zself-profile=$(pwd)/profiles -Zself-profile-events=default,args" cargo build --bin auraed
$ crox --minimum-duration 10 --dir $(pwd)/profiles

oh look. a 2.4 GB JSON file for the chrome profiler to play with. which it rightly refused to load.

increasing the minimum duration to 3000 (3ms) helps. now it's only 23MB.

summary: auraed takes about 5.5mins to build clean with all but 47secs of that being dependencies. worst offenders are tonic, netlink_proto, axum, and protobuf. ie, nothing surprising.

so the slowness is likely that we have:

each of which define their own set of dependencies that need checking, maybe downloading, and maybe building. if each takes 2mins, we're already over 30mins.

naively i'd expect that if we merge these into a single crate we'd end up spending less time dealing with dependencies but having less flexibility to build things individually.

future-highway commented 1 year ago

naively i'd expect that if we merge these into a single crate we'd end up spending less time dealing with dependencies but having less flexibility to build things individually.

The guidance for Rust is usually to break projects up into small crates since crates that don't depend on each other can be built in parallel.

By looking at an action that missed the cache and picking a single dependency (I'm picking serde under the assumption that it is a common dependency; it appears 44x in our Cargo.lock), we can get some hints on how Cargo reuses its output.

The following log lines appear for serde:

cargo build --target x86_64-unknown-linux-musl -p auraed
Downloaded serde v1.0.152
Compiling serde v1.0.152
...
cargo build  -p auraescript
Compiling serde v1.0.152
...
cargo build  -p aer
Compiling serde v1.0.152
...
cargo clippy --all-features --workspace --exclude auraed --exclude auraescript --exclude aer  -- -D clippy::all -D warnings
Checking serde v1.0.152
...
cargo clippy --target x86_64-unknown-linux-musl -p auraed --all-features -- -D clippy::all -D warnings
Checking serde v1.0.152
...
cargo clippy  -p auraescript --all-features -- -D clippy::all -D warnings
Checking serde v1.0.152
...
cargo clippy  -p aer --all-features -- -D clippy::all -D warnings
# serde does not get checked even though `aer` has a dependency on it
...
cargo install --target x86_64-unknown-linux-musl --path ./auraed --debug --force
Downloaded serde v1.0.158
Compiling serde v1.0.158
...
cargo install  --path ./auraescript --debug --force
Compiling serde v1.0.158

Notes:

It looks like we can eliminate two cargo clippy calls by altering the other:

It is also possible that we can add cargo build --workspace --exclude auraed and eliminate the auraescript/aer individual builds.

BradLugo commented 1 year ago

TL;DR: We should discuss why, what, and when we run certain CI commands. More to the point, here are some potential places we can improve the build times:

A script for replicating the analysis from this post can be found at the bottom.


Analysis

From my initial investigation, it seems that the most timing-consuming operations are dependency management (particularly on fresh builds, as you might expect), the tests, and linking.

Preliminary details

During this investigation, it became apparent that I didn't understand cargo's caching mechanism and future-highway's comment confirmed it. I couldn't figure out how to handle the cargo build cache across the aurae workspace, so I'll be using sccache.

This analysis was performed on a qemu Ubuntu VM with 4 vCPUs from a i9-9980HK and 16GBs of memory allocated to the VM.

Results and notes

Cargo test timings

Results from cargo test --workspace --timings after a cargo clean (showing 30s+ units): image

Results from cargo test --workspace --timings with cached dependencies (showing 30s+ units): image

Results from RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=lld -Cincremental=false" cargo test --workspace --timings with cached dependencies (more on linkers in a moment, showing 10s+ units): image

Linker timings

Linking time with the default linker:

❯ rg '(link|total)' aer-default.txt
173:time:   0.158; rss:  173MB ->  173MB (   +0MB)      link_binary_check_files_are_writeable
174:time:  58.611; rss:  173MB ->  173MB (   +0MB)      run_linker
175:time:   0.162; rss:  173MB ->  173MB (   +0MB)      link_binary_remove_temps
176:time:  58.932; rss:  173MB ->  173MB (   +0MB)      link_binary
177:time:  58.937; rss:  173MB ->  114MB (  -59MB)      link_crate
178:time:  59.534; rss:  184MB ->  114MB (  -70MB)      link
179:time:  63.064; rss:   29MB ->   97MB (  +68MB)      total
❯ rg '(link|total)' auraed-default.txt
103:   Compiling netlink-packet-utils v0.5.1
112:   Compiling netlink-packet-core v0.4.2
130:   Compiling netlink-sys v0.8.4
144:   Compiling netlink-proto v0.10.0
151:   Compiling netlink-packet-route v0.13.0
166:   Compiling rtnetlink v0.11.0
220:time:   0.224; rss:  298MB ->  298MB (   +0MB)      link_binary_check_files_are_writeable
221:time: 152.282; rss:  298MB ->  295MB (   -4MB)      run_linker
222:time:   0.298; rss:  136MB ->  136MB (   +0MB)      link_binary_remove_temps
223:time: 152.823; rss:  298MB ->  136MB ( -162MB)      link_binary
224:time: 152.823; rss:  298MB ->  136MB ( -162MB)      link_crate
225:time: 153.706; rss:  338MB ->  136MB ( -202MB)      link
226:time: 165.519; rss:   29MB ->  104MB (  +76MB)      total
❯ rg '(link|total)' auraescript-default.txt
201:time:   0.058; rss:  155MB ->  155MB (   +0MB)      link_binary_check_files_are_writeable
202:time: 170.649; rss:  155MB ->  155MB (   +1MB)      run_linker
203:time:   0.071; rss:  146MB ->  146MB (   +0MB)      link_binary_remove_temps
204:time: 170.781; rss:  155MB ->  146MB (   -9MB)      link_binary
205:time: 170.781; rss:  155MB ->  146MB (   -9MB)      link_crate
206:time: 171.001; rss:  155MB ->  146MB (   -9MB)      link
207:time: 173.885; rss:   29MB ->   96MB (  +68MB)      total

Linking time with lld:

❯ rg '(link|total)' aer-lld.txt
171:time:   0.169; rss:  143MB ->  143MB (   +0MB)      link_binary_check_files_are_writeable
172:time:   9.499; rss:  143MB ->  143MB (   +0MB)      run_linker
173:time:   0.173; rss:  143MB ->  143MB (   +0MB)      link_binary_remove_temps
174:time:   9.844; rss:  143MB ->  143MB (   +0MB)      link_binary
175:time:   9.844; rss:  143MB ->  143MB (   +0MB)      link_crate
176:time:  10.830; rss:  143MB ->  143MB (   +0MB)      link
177:time:  13.903; rss:   29MB ->  102MB (  +73MB)      total
❯ rg '(link|total)' auraed-lld.txt
104:   Compiling netlink-packet-utils v0.5.1
110:   Compiling netlink-packet-core v0.4.2
131:   Compiling netlink-sys v0.8.4
147:   Compiling netlink-proto v0.10.0
152:   Compiling netlink-packet-route v0.13.0
165:   Compiling rtnetlink v0.11.0
219:time:   0.270; rss:  216MB ->  216MB (   +0MB)      link_binary_check_files_are_writeable
220:time:  19.198; rss:  216MB ->  216MB (   +0MB)      run_linker
221:time:   0.290; rss:  124MB ->  124MB (   +0MB)      link_binary_remove_temps
222:time:  19.767; rss:  216MB ->  124MB (  -92MB)      link_binary
223:time:  19.767; rss:  216MB ->  124MB (  -92MB)      link_crate
224:time:  21.418; rss:  281MB ->  124MB ( -157MB)      link
225:time:  27.115; rss:   29MB ->   95MB (  +67MB)      total
❯ rg '(link|total)' auraescript-lld.txt
200:time:   0.053; rss:  135MB ->  135MB (   +0MB)      link_binary_check_files_are_writeable
201:time:  16.251; rss:  135MB ->  135MB (   +1MB)      run_linker
202:time:   0.052; rss:  129MB ->  129MB (   +0MB)      link_binary_remove_temps
203:time:  16.358; rss:  135MB ->  129MB (   -6MB)      link_binary
204:time:  16.358; rss:  135MB ->  129MB (   -6MB)      link_crate
205:time:  16.706; rss:  135MB ->  129MB (   -6MB)      link
206:time:  19.145; rss:   29MB ->   95MB (  +67MB)      total

Linking time with mold:

❯ rg '(link|total)' aer-mold.txt
167:time:   0.173; rss:  143MB ->  143MB (   +0MB)      link_binary_check_files_are_writeable
168:time:   9.127; rss:  143MB ->  143MB (   +0MB)      run_linker
169:time:   0.206; rss:  113MB ->  113MB (   +0MB)      link_binary_remove_temps
170:time:   9.510; rss:  143MB ->  113MB (  -30MB)      link_binary
171:time:   9.510; rss:  143MB ->  113MB (  -30MB)      link_crate
172:time:  10.497; rss:  143MB ->  113MB (  -30MB)      link
173:time:  13.286; rss:   29MB ->  102MB (  +74MB)      total
❯ rg '(link|total)' auraed-mold.txt
104:   Compiling netlink-packet-utils v0.5.1
108:   Compiling netlink-packet-core v0.4.2
130:   Compiling netlink-sys v0.8.4
146:   Compiling netlink-proto v0.10.0
151:   Compiling netlink-packet-route v0.13.0
162:   Compiling rtnetlink v0.11.0
219:time:   0.275; rss:  213MB ->  213MB (   +0MB)      link_binary_check_files_are_writeable
220:time:  19.409; rss:  210MB ->  210MB (   +0MB)      run_linker
221:time:   0.339; rss:  120MB ->  120MB (   +0MB)      link_binary_remove_temps
222:time:  20.033; rss:  213MB ->  120MB (  -93MB)      link_binary
223:time:  20.033; rss:  213MB ->  120MB (  -93MB)      link_crate
224:time:  21.605; rss:  279MB ->  120MB ( -159MB)      link
225:time:  26.949; rss:   29MB ->   95MB (  +66MB)      total
❯ rg '(link|total)' auraescript-mold.txt
201:time:   0.054; rss:  135MB ->  135MB (   +0MB)      link_binary_check_files_are_writeable
202:time:  18.446; rss:  135MB ->  135MB (   +1MB)      run_linker
203:time:   0.064; rss:  135MB ->  135MB (   +0MB)      link_binary_remove_temps
204:time:  18.565; rss:  135MB ->  135MB (   +1MB)      link_binary
205:time:  18.565; rss:  135MB ->  135MB (   +1MB)      link_crate
206:time:  18.898; rss:  135MB ->  135MB (   +1MB)      link
207:time:  21.611; rss:   29MB ->   96MB (  +67MB)      total

An important note regarding linkers is that cargo will create a binary for each integration test file, which requires the whole linking rigamarole, for each test file. This is why I included the lld in the above cargo test timings. One way to circumvent a large portion of this problem is to use a single test binary, i.e., put all the tests in another directory.

Another important note is that I couldn't get mold to work properly with the integration tests. Needs further investigation.

Better caching

As I mentioned in the beginning, I don't quite understand how cargo caches the dependencies across a workspace, but at this point, I suspect it doesn't. We might be able to figure out a better keying method for our current caching mechanism in our CI, or swap to sccache after better understanding how all these caches work.

Putting it all together

This isn't in the script below.

Here's the timing after implementing the caching with sccache, linking via lld, and combining the integration tests (showing all 5s+ units): image

Total time: 201.1s (3m 21.1s)

The "what," "why," and "when" in aurae's CI

A great constructive exercise we can do as part of this ticket is to discuss how we're using CI. I was watching Nova's stream when this ticket was made, and it seemed like the root problem was that Nova couldn't get a cargo check error in a timely manner. There's something to be said for having fast compilation times generally, but pragmatically, what exactly do we want from our CI, why do we want those things, and when do we want them?

cargo check is a great example to kick off this conversation:

Script

Assumes you have followed the aurae build instructions, have export RUSTC_WRAPPER=sccache in the shell you're running this script in, and have the following installed:

progs=(
    "aer"
    "auraed"
    "auraescript"
)

linkers=(
    "lld"
    "mold"
)

echo "INFO: Running analysis"
mkdir -p profiling

echo "INFO: Cleaning"
make clean-gen clean-crates
echo "INFO: Finished cleaning"

echo "INFO: First run of cargo test --workspace --timings (no cache)"
cargo test --workspace --timings
mv target/cargo-timings/cargo-timing.html profiling/cargo-timing-no-cache.html
echo "INFO: Done with first run of cargo test"

echo "INFO: Second run of cargo test --workspace --timings (using cache)"
cargo test --workspace --timings
mv target/cargo-timings/cargo-timing.html profiling/cargo-timing-with-cache.html
echo "INFO: Done with second run of cargo test"

echo "INFO: Third run of cargo test --workspace --timings (using lld and cache)"
RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=lld" cargo test --workspace --timings
mv target/cargo-timings/cargo-timing.html profiling/cargo-timing-lld-with-cache.html
echo "INFO: Done with third run of cargo test"

for prog in "${progs[@]}"; do
    echo "INFO: Running $prog + default"
    cargo +nightly rustc -p "$prog" --bins -- -Z time-passes &> "profiling/$prog-default.txt"
    echo "INFO: Done with $prog + default"
done
for linker in "${linkers[@]}"; do
    for prog in "${progs[@]}"; do
        echo "INFO: Running $prog + $linker"
        RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=$linker" cargo +nightly rustc -p "$prog" --bins -- -Z time-passes &> "profiling/$prog-$linker.txt"
        echo "INFO: Done with $prog + $linker"
    done
done

echo "INFO: Results for timings with no cache can be found at $(pwd)/profiling/cargo-timing-no-cache.html"
echo "INFO: Results for timings with cache can be found at $(pwd)/profiling/cargo-timing-with-cache.html"
for prog in "${progs[@]}"; do
    echo "INFO: Results for $prog + default time-passes can be found at $(pwd)/profiling/$prog-default.txt"
done
for linker in "${linkers[@]}"; do
    for prog in "${progs[@]}"; do
        echo "INFO: Results for $prog + $linker time-passes can be found at $(pwd)/profiling/$prog-$linker.txt"
    done
done

echo "INFO: View in browser via file://$(pwd)/profiling"
dmah42 commented 4 months ago

my builds are still slow, but i'm not sure what we can do. maybe we should think about breaking out the monorepo...