avh4 / elm-format

elm-format formats Elm source code according to a standard set of rules based on the official Elm Style Guide
BSD 3-Clause "New" or "Revised" License
1.31k stars 148 forks source link

Process multiple files in parallel #770

Closed lydell closed 1 year ago

lydell commented 2 years ago

Fixes https://github.com/avh4/elm-format/issues/755. Related to https://github.com/avh4/elm-format/issues/183.

On my computer, it’s roughly 2.5 times faster! :tada:

command main this PR
elm-format --validate . 7.7 s 2.9 s
elm-format --yes . >/dev/null 7.7 s 2.9 s
elm-format --yes . 7.7 s 3.3 s

main refers to commit 15578927a7fd327abd9c619031243c0abe96c57b.

Tested on a code base of this size:

❯ tokei --type Elm
===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Elm                   745       221612       178984         5131        37497

Computer specs:

Terminal: elm-format --yes . (without >/dev/null) depends on your Terminal. With the default Apple Terminal, I got 3.3 seconds as shown in the table above. With iTerm2, I got 5.5 seconds (and 9.7 seconds for main).

Notes:

avh4 commented 2 years ago

Awesome, thanks!!

Apologies, I'll probably be a little bit slow on merging this since I want to re-read the InfoFormatter changes a couple times, and make sure I personally understand how Haskell's laziness works w/r to the new code. (I trust that doesn't really break anything, both from a quick skim of the code, and from the testing you've done; but I want to put a bit higher of a bar on myself to understand what the runtime behavior actually is.)

Here are a couple things I'll want to take a look at before I merge. Feel free to take a stab at them if you want, or no worries if you don't, I should have a chance to get to them in the next couple weeks.

avh4 commented 1 year ago

Testing version for MacOS arm64 is available here: https://github.com/avh4/elm-format/actions/runs/4141690402

avh4 commented 1 year ago

Just an additional benchmark on a Mac M1 using --validate on ~150k LOC:

(using https://github.com/sharkdp/hyperfine hyperfine --warmup 3 --shell=none)

Version Mean [s] Min [s] Max [s] Relative
darwin-arm64
this PR 0.824 ± 0.006 0.815 0.832 1.00
main 2.776 ± 0.015 2.758 2.805 3.37 ± 0.03
darwin-x86 via Rosetta
this PR 1.058 ± 0.012 1.042 1.085 1.28 ± 0.02
main 9.603 ± 0.096 9.475 9.770 11.65 ± 0.14
avh4 commented 1 year ago

I'm thinking using the print lock, it should be possible now to have the forked jobs just print their output, and then only have to pass back a Bool to the parent, and that maybe also would restore the streaming output of the json

I took a brief look at doing this now, but it doesn't seem worth the time atm. Doing it in a clean way seems like it would require a way of coordinating an additional thread, which would fit better with the possible future work of using conduit.

avh4 commented 1 year ago

Split off https://github.com/avh4/elm-format/issues/788 for possible future refactoring.

avh4 commented 1 year ago

Merged via https://github.com/avh4/elm-format/pull/789

Thank you!!