drahnr / cargo-spellcheck

Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar
Apache License 2.0
314 stars 32 forks source link

Parallelize checking for 10x performance win #306

Closed aatifsyed closed 11 months ago

aatifsyed commented 11 months ago

What does this PR accomplish?

comparison with cargo-spellcheck 0.12.4 by running cargo spellcheck on https://github.com/rust-lang/rust/tree/22e9fe644ea710eec50cb0aabcae7fa8dd9fd675

(see below note for why I've done -j $(nproc)

$ hyperfine "cargo spellcheck -j $(nproc)" "~/code/cargo-spellcheck/target/release/cargo-spellcheck -j $(nproc)" --export-markdown hyperfine.md 
Benchmark 1: cargo spellcheck -j 16
  Time (mean ± σ):     356.515 s ± 11.933 s    [User: 409.724 s, System: 155.652 s]
  Range (min … max):   340.755 s … 374.962 s    10 runs

Benchmark 2: ~/code/cargo-spellcheck/target/release/cargo-spellcheck -j 16
  Time (mean ± σ):     32.953 s ±  2.922 s    [User: 353.613 s, System: 13.498 s]
  Range (min … max):   26.296 s … 36.698 s    10 runs

Summary
  '~/code/cargo-spellcheck/target/release/cargo-spellcheck -j 16' ran
   10.82 ± 1.03 times faster than 'cargo spellcheck -j 16'
Command Mean [s] Min [s] Max [s] Relative
cargo spellcheck -j 16 356.515 ± 11.933 340.755 374.962 10.82 ± 1.03
~/code/cargo-spellcheck/target/release/cargo-spellcheck -j 16 32.953 ± 2.922 26.296 36.698 1.00

Closes # .

Changes proposed by this PR:

Notes to reviewer:

Not sure why the previous checking code was single threaded - it was definitely attempting to be parallel, but misunderstood the stream APIs.

📜 Checklist

drahnr commented 11 months ago

Thank you for your PR!

I indeed misread the API documentation of buffered.

Limiting to the physical CPUs was intended, most people run this on their local machines and still want to be able to work while the (then very long) checks were running.

Let me bench the PR, and then let's discuss :)

drahnr commented 11 months ago

My results appear to be very different on 40k loc project using the following command:

cargo b --release; hyperfine --show-output -r 2 "./target/release/cargo-spellcheck check -j16 --code 0 ../secret/project"

Current main branch

# git 8e4cfcce7f8cf3984664feba44e90dabe9d9fcf1
Benchmark 1: ./target/release/cargo-spellcheck check -j16 --code 0 ../gensyn/node
  Time (mean ± σ):     191.1 ms ±   5.7 ms    [User: 246.0 ms, System: 55.1 ms]
  Range (min … max):   187.1 ms … 195.2 ms    2 runs
# git 6c50e07efa303aa37fff73be1d7b187b36dc3e22
Benchmark 1: ./target/release/cargo-spellcheck check -j16 --code 0 ../gensyn/node
  Time (mean ± σ):     184.6 ms ±   3.2 ms    [User: 236.8 ms, System: 58.1 ms]
  Range (min … max):   182.4 ms … 186.9 ms    2 runs

which aligns with the CPU utilization I see - it's only one core regardless.

drahnr commented 11 months ago
Command Mean [ms] Min [ms] Max [ms] Relative
./cargo-spellcheck-8e4cfcce7f8cf3984664feba44e90dabe9d9fcf1 -j 16 --code 0 ../gensyn/node 180.1 ± 6.3 170.1 191.2 1.01 ± 0.05
./cargo-spellcheck-6c50e07efa303aa37fff73be1d7b187b36dc3e22 -j 16 --code 0 ../gensyn/node 178.5 ± 6.7 171.0 191.0 1.00
hyperfine --show-output "./cargo-spellcheck-8e4cfcce7f8cf3984664feba44e90dabe9d9fcf1 -j 16 --code 0 ../gensyn/node" "./cargo-spellcheck-6c50e07efa303aa37fff73be1d7b187b36dc3e22 -j 16 --code 0 ../gensyn/node" --export-markdown hyperfine.md 
Benchmark 1: ./cargo-spellcheck-8e4cfcce7f8cf3984664feba44e90dabe9d9fcf1 -j 16 --code 0 ../gensyn/node
  Time (mean ± σ):     180.1 ms ±   6.3 ms    [User: 238.3 ms, System: 59.6 ms]
  Range (min … max):   170.1 ms … 191.2 ms    15 runs

Benchmark 2: ./cargo-spellcheck-6c50e07efa303aa37fff73be1d7b187b36dc3e22 -j 16 --code 0 ../x
  Time (mean ± σ):     178.5 ms ±   6.7 ms    [User: 235.5 ms, System: 57.5 ms]
  Range (min … max):   171.0 ms … 191.0 ms    15 runs

Summary
  ./cargo-spellcheck-6c50e07efa303aa37fff73be1d7b187b36dc3e22 -j 16 --code 0 ../gensyn/node ran
    1.01 ± 0.05 times faster than ./cargo-spellcheck-8e4cfcce7f8cf3984664feba44e90dabe9d9fcf1 -j 16 --code 0 ../x

I do understand that your code should be faster, but practically it's just about even. I think it'd be worth looking into a) why our outcomes are different and b) why there is only one CPU core used.

drahnr commented 11 months ago

@aatifsyed see the above monologue :)

aatifsyed commented 11 months ago

Interesting - just to clarify my understanding, on this branch you're seeing only a single core used?

drahnr commented 11 months ago

Interesting - just to clarify my understanding, on this branch you're seeing only a single core used?

Yes, exactly. The pattern differs, but there is only ever one CPU used at 100% with this branch. I suspected it being IO bound or a lock in stdout (hypothesis).

aatifsyed commented 11 months ago

That's very surprising.

I've prepared tracing on two branches for you: https://github.com/aatifsyed/cargo-spellcheck/tree/parallel-traced (my code) https://github.com/aatifsyed/cargo-spellcheck/tree/single-thread-traced (pre-existing code)

Running each will give you a trace-<unix_seconds>.json that you can view at https://ui.perfetto.dev, or chome://tracing in a chromium browser.

Here are the traces on my machine (for a bare cargo spellcheck invocation), on https://github.com/ChainSafe/forest/:

$ loc
--------------------------------------------------------------------------------
 Language             Files        Lines        Blank      Comment         Code
--------------------------------------------------------------------------------
 Rust                   313        51683         5975         5028        40680
 Markdown                27         4669          823            0         3846
...

parallel-traced (my code) image you can see the multiple threads of execution, taking ~6s trace-parallel.json.txt

single-thread-traced (pre-existing code) image you can see that there only a single thread, taking ~45s trace-single-threaded.json.txt

Want to try showing the traces from runs on your machine?

aatifsyed commented 11 months ago

I suspected it being IO bound or a lock in stdout (hypothesis).

I'd be surpised if this was the case!

drahnr commented 11 months ago

parallel.json.txt vanilla.json.txt

I am on fedora 38 AMD 64 with a 16 core processor. Are you on Mac OS X?

drahnr commented 11 months ago

forst-parallel.json.txt forest-vanilla.json.txt

When running on forest, I indeed see the ~10x speedup.


I'll have to debug why this is not the case for the private repo, but I don't see any obstacles to merging this as is.

Could you open up follow up PR introducing the tracing?

drahnr commented 11 months ago

Alright, I think I figured the root cause, it's rather unintuitive cli flag interpretation.