Open plexus opened 3 years ago
thanks @plexus for letting me know, will have a look
Thank you @benedekfazekas! Observing a little with jstack
I noticed a lot of clojure-agent-send-off-pool threads, over 200, with up to ~100 RUNNABLE at once. Perhaps worth looking at how futures are used.
was thinking about that and maybe check if need to upgrade parallel
From what I see parallel
is only used for a single p/pmap
call. Perhaps it can be simplified to a mere pmap
?
It's what I use in friend projects like refactor-nrepl, Orchard etc.
core/pmap
is smarter than it might seem - it doesn't process more items than CPUs are available, and reuses threads from a pool so one isn't constantly spinning new threads for inputs consisting of many items.
The only caveat is that it's lazy, so if going this route I'd recommend wrapping the pmap in doall
or such.
And as Arne points out there's future
usage as well.
That's where things seem to go awry, future
(defn replace-ns-symbol
) is ultimately called inside a pmap
(defn replace-ns-symbol-in-source-files
), so one gets a thread fest.
As a superficial impression/suggestion, I'd remove future
s making those computations sync, and let the surrounding pmap
take care of the parallelism. That way one gets rid of mostly-redundant thread spawning and bounds the thread count by CPU count.
If this sounds sensible I can PR these!
thanks for the analysis @vemv happy to get a PR on this
We simplified/reduced parallelism in https://github.com/benedekfazekas/mranderson/pull/60 .
No slowness could be repro'd in CI, however we kept seeing it in our laptops. Could be a macOS thing assuming you are also on macs?
Any updates here? I got reminded about this while trying to deploy https://github.com/clojure-emacs/cider-nrepl/releases/tag/v0.27.3 I'm using Ubuntu, btw. (via WSL)
Btw, might be good to cut a release with the improvement @vemv has already done.
not since @vemv 's latest comment. maybe would make sense to move creating the release candidate on CI.
release: I have two pending issues I want to solve/close before the next release. maybe this year.
Fair enough.
FWIW, I experienced the slowness when running refactor-nrepl v3.2.2 (which now references 0.5.4-SNAPSHOT
) tests on macOS 10.15.7 with JDK17.
$ java --version
openjdk version "17.0.2" 2022-01-18
OpenJDK Runtime Environment Temurin-17.0.2+8 (build 17.0.2+8)
OpenJDK 64-Bit Server VM Temurin-17.0.2+8 (build 17.0.2+8, mixed mode, sharing)
Switching to JDK11:
$ java --version
openjdk 11.0.13 2021-10-19
OpenJDK Runtime Environment Temurin-11.0.13+8 (build 11.0.13+8)
OpenJDK 64-Bit Server VM Temurin-11.0.13+8 (build 11.0.13+8, mixed mode)
Made things nice and speedy again.
Ok, I'm curious about this one. I'll use refactor-nrepl v3.5.5 as my guinea pig. It uses the MrAnderson 0.5.4-SNAPSHOT
release.
Here's what I'll run:
lein clean
rm -f .inline-deps
/usr/bin/time lein with-profile -user,+1.10 inline-deps
(not sure that the rm -rf
is relevant, but refactor-nrepl does it, so I'll do it too).
I'm on macOS 10.15.7, 3.5 GHz Quad-Core Intel Core i7, 32gb RAM. I am using skdman to choose JDKs.
sdk use java | timing results | human friendly elapsed | judgement |
---|---|---|---|
11.0.16.1-tem |
98.74 real 430.80 user 11.60 sys |
1m39s | good |
17.0.4.1-tem |
82.93 real 344.64 user 14.26 sys |
1m23s | good |
17.0.2-tem |
3456.73 real 5179.71 user 5256.95 sys |
57m37s | eek! |
17.0.3-tem |
4219.20 real 5085.88 user 3904.21 sys |
1h10m19s | eek! |
So whatever the cause of the slowdown, it does not exhibit itself for me on the current JDK17 release.
One guess on the cause is this JDK bugfix, which is a backport of this JDK bugfix.
nice one. specially finding the jdk issues. can you check if setting mranderson.internal.no-parallelism
sys property makes any difference on the poorly performing jdks?
suppose we can just add a note to the readme to use jdk 11 or rather > 17.0.4.1 and mention the above sys property if it helps.
Good idea! A repeat of tests with parallelism disabled:
lein clean
rm -f .inline-deps
LEIN_JVM_OPTS=-Dmranderson.internal.no-parallelism=true /usr/bin/time lein with-profile -user,+1.10 inline-deps
sdk use java | timing results | human friendly elapsed | judgement |
---|---|---|---|
11.0.16.1-tem |
144.86 real 164.35 user 9.88 sys |
2m25s | slower, but good |
17.0.4.1-tem |
122.72 real 133.24 user 11.10 sys |
2m3s | slower, but good |
17.0.2-tem |
124.16 real 135.05 user 11.84 sys |
2m4s | well, well! |
17.0.3-tem |
127.53 real 137.23 user 11.88 sys |
2m8s | huh! |
So the parallelism is greatly adversely affecting the buggy JDKs.
Just to make sure, I'll rerun one of the prior tests with parallelism enabled: | sdk use java | timing results | human friendly elapsed | judgement |
---|---|---|---|---|
17.0.3-tem |
3005.35 real 5028.85 user 8029.54 sys |
50m5s | reproducible eek! |
Some options/ideas:
I think option 1 is a no-brainer. Option 2 should be relatively easy and avoid support questions. Options 4 and 5 intrigue my curiosity.
I'm getting a better understanding of the work that MrAnderson is doing.
It is repeating a lot of file slurp and parsing that maybe it does not have to.
Idea 1. I'm thinking it would be interesting to explore analyzing all that work needs to be done first, then applying that work in one pass (or a few if that makes more sense) of the source tree.
Idea 2. It might also be interesting to explore employing clj-kondo data analysis. It could tell us what namespaces are used where. It might not help with the imports much (or at all), but I think they are already handled pretty efficiently.
I'll start exploring idea 1. Idea 2 looks interesting but does bring in more deps.
I guess the keys fns here to look at are mranderson.move/replace-ns-symbol
and mranders.move/update-file
. This ns is relatively well documented and yes there are many runs as mranderson needs go through the dependency tree and replace references to deps vars again and again in the deps files it already processed for a previous dependency. maybe there is a way to simplify this (therefore making it more effective) with clj-kondo
Even without clj-kondo, I think we could work out the list of renames we need to make first, then apply them all at once to each file. Right now I think renames are applied one at time to each file.
I'm thinking that instead of this pseudo-code:
(for [r renames f files] (-> f slurp parse (update-with r) spit))
We would instead
(for [f files] (-> f slurp parse (update-with renames) spit))
Which would mean a lot less slurping parsing and spitting (somebody who doesn't know Clojure would now be grossed out!).
I'm guessing this would probably make the pmap unnecessary.
slurp, spit: 🤣 ah maybe I misunderstood you previously as I thought you meant the new rounds of processing for every dependency. this might work and sounds the right thing to do.
Yeah, I don't think I was entirely clear. Pseudocode helps!
I'll give idea 1 a whirl and learn what, if any, downsides there are to this approach.
I got distracted by some other work, I am looking forward to starting back on this very soon.
thanks, all your contributions here are appreciated already, following rewrite-clj i realise you are quite busy elsewhere too
Comparing Java 11 to Java 16, it seems on the latter MrAnderson is an order of maginitude slower than on the former.
Using sdkman
So 5 minutes instead of 28 seconds. A cider-nrepl build is taking half an hour on my machine.
Not sure if this is a red herring, but on the latter I'm getting this warning