astral-sh / uv

An extremely fast Python package installer and resolver, written in Rust.
https://astral.sh/
Apache License 2.0
11.71k stars 321 forks source link

Run resolve/install benchmarks in ci #3281

Closed ibraheemdev closed 1 week ago

ibraheemdev commented 2 weeks ago

Summary

Runs resolver benchmarks in CI with codspeed.

codspeed-hq[bot] commented 2 weeks ago

CodSpeed Performance Report

Congrats! CodSpeed is installed ๐ŸŽ‰

๐Ÿ†• 12 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks

- `build_platform_tags[burntsushi-archlinux]` (6.3 ms) - `wheelname_parsing[flyte-long-compatible]` (21 ยตs) - `wheelname_parsing[flyte-long-incompatible]` (26.3 ยตs) - `wheelname_parsing[flyte-short-compatible]` (11.9 ยตs) - `wheelname_parsing[flyte-short-incompatible]` (12.2 ยตs) - `wheelname_parsing_failure[flyte-long-extension]` (2.6 ยตs) - `wheelname_parsing_failure[flyte-short-extension]` (2.6 ยตs) - `wheelname_tag_compatibility[flyte-long-compatible]` (2.6 ยตs) - `wheelname_tag_compatibility[flyte-long-incompatible]` (1.8 ยตs) - `wheelname_tag_compatibility[flyte-short-compatible]` (2.5 ยตs) - `wheelname_tag_compatibility[flyte-short-incompatible]` (1.1 ยตs) - `resolve_warm_jupyter` (366.7 ms)
ibraheemdev commented 2 weeks ago

Hmm it doesn't look like the benchmarks are running correctly under Codspeed, the performance report is showing the resolve/install benchmarks running in microseconds.

adriencaccia commented 2 weeks ago

Hey @ibraheemdev, I am a co-founder at @CodSpeedHQ!

Hmm it doesn't look like the benchmarks are running correctly under Codspeed, the performance report is showing the resolve/install benchmarks running in microseconds.

Yes, running arbitrary executables in a benchmark with CodSpeed will not give out relevant results, as most of the compute is done in a new process that is not instrumented. It would be best to directly call the underlying functions of the library, without relying on the built executable.

For example, calling https://github.com/astral-sh/uv/blob/2af80c28a8e6a2da755ab78f3ea7b028e8b1510c/crates/uv/src/commands/pip_compile.rs#L52 instead of https://github.com/ibraheemdev/uv/blob/4ebdc40f60562c05559ac6331abe1a56275e2c8b/crates/bench/benches/uv.rs#L41-L42.

Hope that helps you a bit ๐Ÿ˜ƒ

ibraheemdev commented 2 weeks ago

@adriencaccia Thanks! I suspected we would have to do this eventually, but didn't realize CodSpeed didn't support processing external commands at all.

adriencaccia commented 1 week ago

Nice, thank you! Open to giving this a shot. Do we have any sense for what the variance/noise will look like?

I tested it out on my fork at https://github.com/adriencaccia/uv/pull/1, and I have the following variance results for 101 runs on the same commit:

Found 101 runs for adriencaccia/uv (fca26cde1b54f7467267ca4dff7a9b9cb6f10d29)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                                                              (index)                                                                               โ”‚  average  โ”‚ standardDeviation โ”‚ varianceCoefficient โ”‚   range   โ”‚ rangeCoefficient โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-short-incompatible] โ”‚ '1.1 ยตs'  โ”‚     '27.3 ns'     โ”‚       '2.5%'        โ”‚ '55.6 ns' โ”‚      '5.1%'      โ”‚
โ”‚  crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-short-compatible]  โ”‚ '2.5 ยตs'  โ”‚     '27.3 ns'     โ”‚       '1.1%'        โ”‚ '55.6 ns' โ”‚      '2.2%'      โ”‚
โ”‚  crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-long-compatible]   โ”‚ '2.6 ยตs'  โ”‚     '27.3 ns'     โ”‚       '1.0%'        โ”‚ '55.6 ns' โ”‚      '2.1%'      โ”‚
โ”‚ crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-long-incompatible]  โ”‚ '1.8 ยตs'  โ”‚     '13.6 ns'     โ”‚       '0.7%'        โ”‚ '27.8 ns' โ”‚      '1.5%'      โ”‚
โ”‚    crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing_failure::wheelname_parsing_failure[flyte-short-extension]     โ”‚ '2.6 ยตs'  โ”‚     '13.6 ns'     โ”‚       '0.5%'        โ”‚ '27.8 ns' โ”‚      '1.1%'      โ”‚
โ”‚     crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing_failure::wheelname_parsing_failure[flyte-long-extension]     โ”‚ '2.6 ยตs'  โ”‚     '13.6 ns'     โ”‚       '0.5%'        โ”‚ '27.8 ns' โ”‚      '1.1%'      โ”‚
โ”‚            crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-short-compatible]            โ”‚  '12 ยตs'  โ”‚     '13.6 ns'     โ”‚       '0.1%'        โ”‚ '27.8 ns' โ”‚      '0.2%'      โ”‚
โ”‚           crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-short-incompatible]           โ”‚ '12.2 ยตs' โ”‚     '13.6 ns'     โ”‚       '0.1%'        โ”‚ '27.8 ns' โ”‚      '0.2%'      โ”‚
โ”‚           crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_build_platform_tags::build_platform_tags[burntsushi-archlinux]           โ”‚ '6.3 ms'  โ”‚     '13.6 ns'     โ”‚       '0.0%'        โ”‚ '27.8 ns' โ”‚      '0.0%'      โ”‚
โ”‚           crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-long-incompatible]            โ”‚ '26.3 ยตs' โ”‚      '0 ns'       โ”‚       '0.0%'        โ”‚   '0 s'   โ”‚      '0.0%'      โ”‚
โ”‚            crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-long-compatible]             โ”‚  '21 ยตs'  โ”‚      '0 ns'       โ”‚       '0.0%'        โ”‚   '0 s'   โ”‚      '0.0%'      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

It is fairly stable, so you should be able to set a low regression threshold to around 5% ๐Ÿ™‚

charliermarsh commented 1 week ago

Awesome, thanks so much Adrien!

zanieb commented 1 week ago

Hm I don't see the resolver benchmarks there โ€” I'd expect the distribution filename benches to be very stable but the resolver ones are probably less so.

zanieb commented 1 week ago

Looks like there's something wrong and the resolver benches are missing on the latest commit.

ibraheemdev commented 1 week ago

I forgot to use the codspeed-criterion-compat shim in the uv benchmarks, but it looks like the crate doesn't support async runs.

adriencaccia commented 1 week ago

@ibraheemdev let me know when you want me to run variance checks again on the new benchmarks

ibraheemdev commented 1 week ago

@adriencaccia can you run them now? I'm also curious why benchmarks seem to be running ~15x slower on CodSpeed than locally, is it using an aggregate time instead of per-run?

adriencaccia commented 1 week ago

@adriencaccia can you run them now?

Alright, I started them. Will post the results once they are done ๐Ÿ˜‰

I'm also curious why benchmarks seem to be running ~15x slower on CodSpeed than locally, is it using an aggregate time instead of per-run?

This is because we run the code with valgrind, it adds a 4x to 10x overhead, sometimes more. But that is how we get those consistent measures and flamegraphs ๐Ÿ˜‰

adriencaccia commented 1 week ago

Results with the new benchmarks:

Found 101 runs for adriencaccia/uv (7bbc18a361ba078e21186db90b98d6b88b3a8a7c)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                                                              (index)                                                                               โ”‚  average   โ”‚ standardDeviation โ”‚ varianceCoefficient โ”‚   range   โ”‚ rangeCoefficient โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                               crates/bench/benches/uv.rs::uv::resolve_warm_black::resolve_warm_black                                               โ”‚ '15.3 ms'  โ”‚    '527.5 ยตs'     โ”‚       '3.4%'        โ”‚ '2.3 ms'  โ”‚     '15.0%'      โ”‚
โ”‚ crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-short-incompatible] โ”‚  '1.1 ยตs'  โ”‚     '27.5 ns'     โ”‚       '2.5%'        โ”‚ '55.6 ns' โ”‚      '5.1%'      โ”‚
โ”‚                                             crates/bench/benches/uv.rs::uv::resolve_warm_jupyter::resolve_warm_jupyter                                             โ”‚ '366.5 ms' โ”‚     '4.2 ms'      โ”‚       '1.1%'        โ”‚ '22.8 ms' โ”‚      '6.2%'      โ”‚
โ”‚  crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-short-compatible]  โ”‚  '2.5 ยตs'  โ”‚     '27.5 ns'     โ”‚       '1.1%'        โ”‚ '55.6 ns' โ”‚      '2.2%'      โ”‚
โ”‚  crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-long-compatible]   โ”‚  '2.6 ยตs'  โ”‚     '27.5 ns'     โ”‚       '1.0%'        โ”‚ '55.6 ns' โ”‚      '2.1%'      โ”‚
โ”‚ crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-long-incompatible]  โ”‚  '1.9 ยตs'  โ”‚     '13.7 ns'     โ”‚       '0.7%'        โ”‚ '27.8 ns' โ”‚      '1.5%'      โ”‚
โ”‚    crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing_failure::wheelname_parsing_failure[flyte-short-extension]     โ”‚  '2.6 ยตs'  โ”‚     '13.7 ns'     โ”‚       '0.5%'        โ”‚ '27.8 ns' โ”‚      '1.1%'      โ”‚
โ”‚     crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing_failure::wheelname_parsing_failure[flyte-long-extension]     โ”‚  '2.6 ยตs'  โ”‚     '13.7 ns'     โ”‚       '0.5%'        โ”‚ '27.8 ns' โ”‚      '1.1%'      โ”‚
โ”‚            crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-short-compatible]            โ”‚  '12 ยตs'   โ”‚     '13.7 ns'     โ”‚       '0.1%'        โ”‚ '27.8 ns' โ”‚      '0.2%'      โ”‚
โ”‚           crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-short-incompatible]           โ”‚ '12.2 ยตs'  โ”‚     '13.7 ns'     โ”‚       '0.1%'        โ”‚ '27.8 ns' โ”‚      '0.2%'      โ”‚
โ”‚           crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_build_platform_tags::build_platform_tags[burntsushi-archlinux]           โ”‚  '6.3 ms'  โ”‚     '13.7 ns'     โ”‚       '0.0%'        โ”‚ '27.8 ns' โ”‚      '0.0%'      โ”‚
โ”‚           crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-long-incompatible]            โ”‚ '26.3 ยตs'  โ”‚      '0 ns'       โ”‚       '0.0%'        โ”‚   '0 s'   โ”‚      '0.0%'      โ”‚
โ”‚            crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-long-compatible]             โ”‚  '21 ยตs'   โ”‚      '0 ns'       โ”‚       '0.0%'        โ”‚   '0 s'   โ”‚      '0.0%'      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Indeed, it seems that crates/bench/benches/uv.rs::uv::resolve_warm_black::resolve_warm_black is a bit more inconsistent

charliermarsh commented 1 week ago

Maybe we remove the Black test? It seems like the variance is way higher than the Jupyter test.

zanieb commented 1 week ago

I wonder why that is. It shouldn't be that different? (as far as variance)

ibraheemdev commented 1 week ago

@zanieb It's probably that the actual resolve step is faster, so the benchmark is more influenced by other factors (file I/O, etc.)

ibraheemdev commented 1 week ago

I'm going to go ahead and merge this with just the jupyter benchmark. We'll see how consistent/useful the reports are.