Closed aatifsyed closed 11 months ago
Thank you for your PR!
I indeed misread the API documentation of buffered
.
Limiting to the physical CPUs was intended, most people run this on their local machines and still want to be able to work while the (then very long) checks were running.
Let me bench the PR, and then let's discuss :)
My results appear to be very different on 40k loc project using the following command:
cargo b --release; hyperfine --show-output -r 2 "./target/release/cargo-spellcheck check -j16 --code 0 ../secret/project"
Current main branch
# git 8e4cfcce7f8cf3984664feba44e90dabe9d9fcf1
Benchmark 1: ./target/release/cargo-spellcheck check -j16 --code 0 ../gensyn/node
Time (mean ± σ): 191.1 ms ± 5.7 ms [User: 246.0 ms, System: 55.1 ms]
Range (min … max): 187.1 ms … 195.2 ms 2 runs
# git 6c50e07efa303aa37fff73be1d7b187b36dc3e22
Benchmark 1: ./target/release/cargo-spellcheck check -j16 --code 0 ../gensyn/node
Time (mean ± σ): 184.6 ms ± 3.2 ms [User: 236.8 ms, System: 58.1 ms]
Range (min … max): 182.4 ms … 186.9 ms 2 runs
which aligns with the CPU utilization I see - it's only one core regardless.
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
./cargo-spellcheck-8e4cfcce7f8cf3984664feba44e90dabe9d9fcf1 -j 16 --code 0 ../gensyn/node |
180.1 ± 6.3 | 170.1 | 191.2 | 1.01 ± 0.05 |
./cargo-spellcheck-6c50e07efa303aa37fff73be1d7b187b36dc3e22 -j 16 --code 0 ../gensyn/node |
178.5 ± 6.7 | 171.0 | 191.0 | 1.00 |
hyperfine --show-output "./cargo-spellcheck-8e4cfcce7f8cf3984664feba44e90dabe9d9fcf1 -j 16 --code 0 ../gensyn/node" "./cargo-spellcheck-6c50e07efa303aa37fff73be1d7b187b36dc3e22 -j 16 --code 0 ../gensyn/node" --export-markdown hyperfine.md
Benchmark 1: ./cargo-spellcheck-8e4cfcce7f8cf3984664feba44e90dabe9d9fcf1 -j 16 --code 0 ../gensyn/node
Time (mean ± σ): 180.1 ms ± 6.3 ms [User: 238.3 ms, System: 59.6 ms]
Range (min … max): 170.1 ms … 191.2 ms 15 runs
Benchmark 2: ./cargo-spellcheck-6c50e07efa303aa37fff73be1d7b187b36dc3e22 -j 16 --code 0 ../x
Time (mean ± σ): 178.5 ms ± 6.7 ms [User: 235.5 ms, System: 57.5 ms]
Range (min … max): 171.0 ms … 191.0 ms 15 runs
Summary
./cargo-spellcheck-6c50e07efa303aa37fff73be1d7b187b36dc3e22 -j 16 --code 0 ../gensyn/node ran
1.01 ± 0.05 times faster than ./cargo-spellcheck-8e4cfcce7f8cf3984664feba44e90dabe9d9fcf1 -j 16 --code 0 ../x
I do understand that your code should be faster, but practically it's just about even. I think it'd be worth looking into a) why our outcomes are different and b) why there is only one CPU core used.
@aatifsyed see the above monologue :)
Interesting - just to clarify my understanding, on this branch you're seeing only a single core used?
Interesting - just to clarify my understanding, on this branch you're seeing only a single core used?
Yes, exactly. The pattern differs, but there is only ever one CPU used at 100% with this branch. I suspected it being IO bound or a lock in stdout
(hypothesis).
That's very surprising.
I've prepared tracing on two branches for you: https://github.com/aatifsyed/cargo-spellcheck/tree/parallel-traced (my code) https://github.com/aatifsyed/cargo-spellcheck/tree/single-thread-traced (pre-existing code)
Running each will give you a trace-<unix_seconds>.json
that you can view at https://ui.perfetto.dev, or chome://tracing in a chromium browser.
Here are the traces on my machine (for a bare cargo spellcheck
invocation), on https://github.com/ChainSafe/forest/:
$ loc
--------------------------------------------------------------------------------
Language Files Lines Blank Comment Code
--------------------------------------------------------------------------------
Rust 313 51683 5975 5028 40680
Markdown 27 4669 823 0 3846
...
parallel-traced (my code)
you can see the multiple threads of execution, taking ~6s
trace-parallel.json.txt
single-thread-traced (pre-existing code)
you can see that there only a single thread, taking ~45s
trace-single-threaded.json.txt
Want to try showing the traces from runs on your machine?
I suspected it being IO bound or a lock in stdout (hypothesis).
I'd be surpised if this was the case!
parallel.json.txt vanilla.json.txt
I am on fedora 38 AMD 64 with a 16 core processor. Are you on Mac OS X?
forst-parallel.json.txt forest-vanilla.json.txt
When running on forest
, I indeed see the ~10x speedup.
I'll have to debug why this is not the case for the private repo, but I don't see any obstacles to merging this as is.
Could you open up follow up PR introducing the tracing
?
Alright, I think I figured the root cause, it's rather unintuitive cli flag interpretation.
What does this PR accomplish?
comparison with
cargo-spellcheck 0.12.4
by runningcargo spellcheck
on https://github.com/rust-lang/rust/tree/22e9fe644ea710eec50cb0aabcae7fa8dd9fd675(see below note for why I've done
-j $(nproc)
cargo spellcheck -j 16
~/code/cargo-spellcheck/target/release/cargo-spellcheck -j 16
Closes # .
Changes proposed by this PR:
Notes to reviewer:
Not sure why the previous checking code was single threaded - it was definitely attempting to be parallel, but misunderstood the stream APIs.
tracing
, and have better (parallel) profiling tools.num_cpus::get
instead ofnum_cpus::get_physical
(orstd::thread::available_parallelism
).doc-chunks
Document
should probably be removed📜 Checklist
./demo
sub directory