PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.94k stars 218 forks source link

`cargo build -p prqlc` isn't really cached after `cargo build` #3098

Closed max-sixty closed 1 year ago

max-sixty commented 1 year ago

What's up?

cargo clean
cargo build # takes 30 seconds
cargo build -p prqlc # takes 17 seconds — *not 0 seconds!*
cargo clean
cargo build -p prqlc # takes 22 seconds.

So the really curious thing is why running cargo build hardly helps with cargo build -p prqlc given it presumably builds prqlc as part of its build.

This hits us in the CI runtime of building prqlc.

Is it something to do with the dependencies being slightly different — e.g. features being different?

max-sixty commented 1 year ago

I tried aligning the dependency features etc, but it didn't seem to help.

diff --git a/crates/prql-compiler/Cargo.toml b/crates/prql-compiler/Cargo.toml
index 4b6745e2..e0748a90 100644
--- a/crates/prql-compiler/Cargo.toml
+++ b/crates/prql-compiler/Cargo.toml
@@ -11,15 +11,15 @@ version.workspace = true
 metadata.msrv = "1.65.0"

 [features]
-default = []
+default = ["serde_yaml"]
 # Technically tokio could be limited to external tests, but its types are in
 # signatures which would require lots of conditional compilation.
 test-dbs = ["duckdb", "rusqlite", "tokio"]
 test-dbs-external = ["chrono", "duckdb", "mysql", "pg_bigdecimal", "postgres", "rusqlite", "tiberius", "tokio", "tokio-util"]

 [dependencies]
-prql-ast = {path = "../prql-ast", version = "0.9.2" }
-prql-parser = {path = "../prql-parser", version = "0.9.2" }
+prql-ast = {path = "../prql-ast", version = "0.9.2"}
+prql-parser = {path = "../prql-parser", version = "0.9.2"}

 anstream = {version = "0.3.2", features = ["auto"]}
 anyhow = {version = "1.0.57", features = ["backtrace"]}
@@ -39,7 +39,7 @@ sqlparser = {version = "0.36.0", features = ["serde"]}
 strum = {version = "0.25.0", features = ["std", "derive"]}
 strum_macros = "0.25.0"

-serde_yaml = {version = "0.9", optional = true}
+serde_yaml = {version = "0.9.1", optional = true}

 [target.'cfg(not(target_family="wasm"))'.dependencies]
 # For integration tests. These are not listed as dev-dependencies because
diff --git a/crates/prqlc/Cargo.toml b/crates/prqlc/Cargo.toml
index 9623942a..f99603d0 100644
--- a/crates/prqlc/Cargo.toml
+++ b/crates/prqlc/Cargo.toml
@@ -12,7 +12,7 @@ metadata.msrv = "1.65.0"

 [target.'cfg(not(target_family="wasm"))'.dependencies]
 anstream = {version = "0.3.2", features = ["auto"]}
-anyhow = {version = "1.0.57"}
+anyhow = {version = "1.0.57", features = ["backtrace"]}
 ariadne = "0.3.0"
 clap = {version = "4.3.0", features = ["derive", "env", "wrap_help"]}
 clap_complete_command = "0.5.1"
@@ -23,10 +23,10 @@ colorchoice-clap = "1.0.0"
 env_logger = {version = "0.10.0", features = ["color"]}
 itertools = "0.11.0"
 notify = "^6.0.0"
-prql-ast = {path = '../prql-ast', version = "0.9.2" }
-prql-compiler = {path = '../prql-compiler', features = ["serde_yaml"], version = "0.9.2" }
+prql-ast = {path = '../prql-ast', version = "0.9.2"}
+prql-compiler = {path = '../prql-compiler', features = ["serde_yaml"], version = "0.9.2"}
 regex = {version = "1.9.0", features = ["std", "unicode"]}
-serde = "^1"
+serde = {version = "1.0.137", features = ["derive"]}
 serde_json = "1.0.81"
 serde_yaml = "0.9.1"
 walkdir = "^2.3.2"
aljazerzen commented 1 year ago

I've noticed this even in smaller project without features and a bunch of dependencies. This might help: https://stackoverflow.com/questions/70174147/how-do-i-make-cargo-show-what-files-are-causing-a-rebuild

max-sixty commented 1 year ago

A description consistent with https://github.com/PRQL/prql/pull/3115#issue-1823056908 is here

However, one drawback is that this can increase build times because the dependency is built multiple times (each with different features). When using the version "2" resolver, it is recommended to check for dependencies that are built multiple times to reduce overall build time. If it is not required to build those duplicated packages with separate features, consider adding features to the features list in the dependency declaration so that the duplicates end up with the same features (and thus Cargo will build it only once). You can detect these duplicate dependencies with the cargo tree --duplicates command. It will show which packages are built multiple times; look for any entries listed with the same version. See Inspecting resolved features for more on fetching information on the resolved features. For build dependencies, this is not necessary if you are cross-compiling with the --target flag because build dependencies are always built separately from normal dependencies in that scenario.


For locality, copying https://github.com/PRQL/prql/pull/3115#issue-1823056908 here:


While attempting to look at https://github.com/PRQL/prql/issues/3098, I tried to look at what was causing the builds the second time around.

Here's a fish script (easy to adjust for bash):

begin;
    echo 'Running: cargo +nightly clean'; cargo +nightly clean ;
    echo 'Running first build'; cargo +nightly build --verbose --all-targets;
    echo 'Running second build'; cargo +nightly build --verbose --all-targets -p prqlc;
end &| tee build.log

Then there's two lines in build.log for the same dependency, one from the first; one from the second:

    Running `/Users/maximilian/.rustup/toolchains/nightly-aarch64-apple-darwin/bin/rustc --crate-name serde /Users/maximilian/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.163/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C embed-bitcode=no -C debuginfo=2 -C split-debuginfo=unpacked --cfg 'feature="default"' --cfg 'feature="derive"' --cfg 'feature="serde_derive"' --cfg 'feature="std"' -C metadata=5e334edc106c7255 -C extra-filename=-5e334edc106c7255 --out-dir /Users/maximilian/workspace/prql/target/debug/deps -L dependency=/Users/maximilian/workspace/prql/target/debug/deps --extern serde_derive=/Users/maximilian/workspace/prql/target/debug/deps/libserde_derive-6e3b03dd545079c5.dylib --cap-lints allow`
    Running `/Users/maximilian/.rustup/toolchains/nightly-aarch64-apple-darwin/bin/rustc --crate-name serde /Users/maximilian/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.163/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C embed-bitcode=no -C debuginfo=2 -C split-debuginfo=unpacked --cfg 'feature="alloc"' --cfg 'feature="default"' --cfg 'feature="derive"' --cfg 'feature="serde_derive"' --cfg 'feature="std"' -C metadata=7fbd537ea76b6b2f -C extra-filename=-7fbd537ea76b6b2f --out-dir /Users/maximilian/workspace/prql/target/debug/deps -L dependency=/Users/maximilian/workspace/prql/target/debug/deps --extern serde_derive=/Users/maximilian/workspace/prql/target/debug/deps/libserde_derive-6e3b03dd545079c5.dylib --cap-lints allow`

These have different features — only one has alloc. And then running:

cargo tree -e features --target=(default-target) -i serde

...helps find the thing that's generating the request for the alloc feature — in this case wasm-bindgen.

This seems like a very labor-intensive way of trying to unify dependencies between each crate & the workspace. But I'm including the first fix, which is to not compile wasm-bindgen for non-wasm targets.

So:

max-sixty commented 1 year ago

Closing as "a known problem which we can't do much about". The description at the top of the previous comment explains it well.