Closed mbutrovich closed 3 weeks ago
I will update with some benchmark results tomorrow, but initial results look promising.
I ran some benchmarks locally and confirmed a speedup:
The speedup on q4 is pretty impressive!
Here are the raw JSON benchmark result files:
Thanks for running the benchmarks for me. I was struggling to get reproducible results locally.
~I wonder why there is such a large regression with q72 though~
edit: posted the wrong pngs from the wrong benchmark - updated now
- I think I understood the jni crate's docs with respect to GlobalRef, but a sanity check on if this approach could hold references longer than we want (and leak) would be helpful.
This seems correct to me
I had earlier posted fresh benchmarks that showed a big improvement with the latest commit but I had inadvertently enabled the new replaceSortMergeJoin
feature. I ran again without that enabled and essentially see the same results as the original run (367 seconds versus the earlier 365, which is likely just noise).
- What is the thread safety of this approach? It's unclear to me if multiple threads could be sharing this call stack and trying to write new values into the cache at the same time. I could wrap the HashMap in a latch in exchange for a performance hit, but would like to understand if this is even possible.
Spark has a single thread calling CometExecIterator
, which in turn calls createPlan
, executePlan
, and releasePlan
, so think the current approach is safe.
Which issue does this PR close?
Partially addresses #1024.
Rationale for this change
Comet uses JNI jstrings as the keys to updating metrics values on the Spark side during execution. As described in #1024, currently Comet allocates a jstring for every metric for every invocation of metrics updating. The calls to
jni_NewStringUTF
account for over 1% of the on-CPU time in TPC-H SF10 for me.What changes are included in this PR?
Added a HashMap that maps the native string to a jstring to use in JNI calls. This has the benefit of being many-to-one, whereby multiple nodes with the same metric name will benefit from the cached jstring. This cache is populated on demand: if the entry isn't present, we allocate a jstring and insert it into the cache.
I have some thoughts about this approach that I would love for reviewers to comment on:
ExecutePlan
. However, DF's metrics are Options and don't actually appear to be there until the plan starts executing.How are these changes tested?
Existing tests on the Java side that exercise metrics.