Closed joinr closed 3 years ago
This is weird, I can't recreate the results. Getting the exact same performance for both cases. Which JVM and Clojure versions were you using?
Yeah, I'm seeing similar results. This (was) on java 8 on ubuntu's JDK, whatever variant that was. I just ran it on W10 with adopt open jdk, 1.8.0_222, and had like a 1 ns difference. Considering it anomalous for now (or jdk specific...)
I ran my benchmarks on Ubuntu 20.10 with Java 15. If this was about the JDK version, 8 is considered EOL and we can happily forget about it. If you have no objections I'm closing this issue, but do reopen it if you find it crops up again
, 8 is considered EOL and we can happily forget about it.
Not so. There are many many large orgs still running Java 8 for LTS (part of the reason amazon has their own, among others). Some places (including mine) are still there for the foreseeable future. Just FYI.
Well then, I'll try to recreate
I'll re-run on the original machine. an ubuntu machine on ec2.
Reran with:
;; CIDER 1.1.0snapshot (package: 20210416.1915), nREPL 0.8.3
;; Clojure 1.10.1, Java 1.8.0_282
Ubuntu 20.10 Got same results again, exactly same performance in both cases
I just re-ran it on the same setup, and reproduced:
Execution time mean : 121.353224 ns Execution time mean : 36.097984 ns
openjdk version "1.8.0_282" OpenJDK Runtime Environment (build 1.8.0_282-8u282-b08-0ubuntu1~16.04-b08) OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode)
Weird, can you print out all the options and flags the JVM ran with?
running from lein repl with default options (no :jvm-opts entry).
lein 2.9.1, not that it matters.
Also running this on an EC2 instance; could be some throttling, but I'd expect that to be amortized over the multiple runs.
The JVM still has some default configurations, such as GC, server, client or tiered etc. There's a way to print out all of them, something like https://alvinalexander.com/java/how-see-jvm-parameters-arguments-from-running-java-application/
[:arg -Dfile.encoding=UTF-8] [:arg -XX:-OmitStackTraceInFastThrow] [:arg -XX:+TieredCompilation] [:arg -XX:TieredStopAtLevel=1] [:arg -Dclojure.compile.path=/home/tom/repos/spork/target/classes] [:arg -Dspork.version=0.2.1.4-SNAPSHOT] [:arg -Dclojure.debug=false]
this particular repo running clojure 1.10.1
Okay, I managed to recreate. One of these is the culprit:
"-Djdk.attach.allowAttachSelf" "-XX:+UnlockDiagnosticVMOptions" "-XX:+DebugNonSafepoints"
Printed the options with and without passing JVM opts flag:
;;; no jvm opts
["-Dfile.encoding=UTF-8"
"-XX:-OmitStackTraceInFastThrow"
"-XX:+TieredCompilation"
"-XX:TieredStopAtLevel=1"
"-Dclj-fast.version=0.0.10-SNAPSHOT"
"-Dclojure.debug=false"]
;;; jvm opts
["-Dfile.encoding=UTF-8"
"-Djdk.attach.allowAttachSelf"
"-XX:+UnlockDiagnosticVMOptions"
"-XX:+DebugNonSafepoints"
"-Dclj-fast.version=0.0.10-SNAPSHOT"
"-Dclojure.debug=false"]
I suspect XX:TieredStopAtLevel
with :jvm-opts ["-XX:+UnlockDiagnosticVMOptions" "-XX:+DebugNonSafepoints"]
I get 20 ns for both runs now....both in lein repl and from cider.
Okay, tested with a variety of flags
The culprit is -XX:TieredStopAtLevel
For anything under 4 you'll get the performance difference because C2 doesn't kick in.
It's probably used by lein to reduce start-up time for dev profile. I think lein run
uses dev profile.
I tried passing empty jvm-opts vector and got the same results for both, so I'm attributing this to dev profile.
Alright, yet more jvm corner case magic to understand. If nothing else, the issue is documented for posterity.
I wouldn't call it a corner case of the JVM, more of leiningen IMO - It's an undocumented flag which is injected when youlein run
and messes with performance. It's in Lein's source code
I think I'll open an issue about it.
I think I'll end up adding a few notes to the readme, nothing more I can do in the meanwhile
This is the best I could do in the meanwhile https://github.com/bsless/clj-fast/commit/bba3ec09a29b9929d523dbff8d57242fe3fa285f
Edit suggestions?
No, good writeup (and nugget of opt knowledge). Feel free to close.
Doing some performance stuff for a local search optimization via simulated annealing. My travels included exploring different numeric representations and trade offs, to include using COW for numeric arrays in some places vs. persistent vector/transient vector defaults. It appears aclone is slow for some unknown reason (to me!):
java.util.Arrays/copyOf will beat it handily, and performance is a hair faster (for small arrays) than normal persistent vector's hinted assocN for a similar operation. I always thought
aclone
was more or less optimal....apparently not.