Closed pshipton closed 2 years ago
Looking at jdk_util_0
!nativememinfo
JRE: 2,492,863,856 bytes / 6029 allocations
|
+--VM: 2,199,765,216 bytes / 5582 allocations
| |
| +--Classes: 331,699,056 bytes / 598 allocations
| | |
| | +--Shared Class Cache: 314,572,896 bytes / 2 allocations
| | |
| | +--Other: 17,126,160 bytes / 596 allocations
| |
| +--Modules: 537,968 bytes / 2167 allocations
| |
| +--Memory Manager (GC): 1,647,571,392 bytes / 495 allocations
| | |
| | +--Java Heap: 1,614,868,480 bytes / 1 allocation
!dumpallregions
+----------------+----------------+----------------+----------------+--------+----------------+----------------------
| region | start | end | subspace | flags | size | region type
+----------------+----------------+----------------+----------------+--------+----------------+----------------------
0000ffffb40a7910 000000009fc00000 00000000a0200000 0000ffffb4095430 00000009 600000 ADDRESS_ORDERED
0000ffffb40a7580 00000000ffda0000 00000000fff50000 0000ffffb40a1490 0000000a 1b0000 ADDRESS_ORDERED
0000ffffb40a71f0 00000000fff50000 0000000100000000 0000ffffb409ba20 0000000a b0000 ADDRESS_ORDERED
+----------------+----------------+----------------+----------------+--------+----------------+----------------------
Doesn't seem like excessive GCing, the heap hasn't been expanded much:
03:04:15.210075000 0x00000000001bdb00 j9mm.715 Event Thread: 1 -> timeToStartCollection: 2314 scanStall: 0 syncStall: 444 notifyStall: 51
03:04:15.210077000 0x00000000001bdb00 j9mm.545 Event Scav 1: stall_work= 0ms stall_complete= 0ms stall_sync= 0ms stall_count=0/0/3 free_lists=0/0 scan_lists=2/0
03:04:15.210079000 *0x00000000001c0000 j9mm.715 Event Thread: 2 -> timeToStartCollection: 435 scanStall: 401 syncStall: 374 notifyStall: 0
03:04:15.210080000 0x00000000001c0000 j9mm.545 Event Scav 2: stall_work= 0ms stall_complete= 0ms stall_sync= 0ms stall_count=0/2/3 free_lists=5/3 scan_lists=3/2
03:04:15.210244000 *0x0000000000000000 j9mm.140 Event Tilt ratio: 65
03:04:15.210415000 0x0000000000000000 j9mm.560 Event LocalGC end: rememberedsetoverflow=0 causedrememberedsetoverflow=0 scancacheoverflow=0 failedflipcount=0 failedflipbytes=0 failedtenurecount=0 failedtenurebytes=0 flipcount=3183 flipbytes=125600 newspace=1459200/2490368 oldspace=4279896/6291456 loa=315392/315392 tenureage=0
03:04:15.210883000 0x0000000000000000 j9mm.468 Event Cycle End: type 2 approximateFreeMemorySize 5739096
03:04:15.210902000 0x0000000000000000 j9mm.470 Event Allocation failure cycle end: newspace=1459200/2490368 oldspace=4279896/6291456 loa=315392/315392
03:04:15.210921000 0x0000000000000000 j9mm.134 Event Allocation failure end: newspace=1457136/2490368 oldspace=4279896/6291456 loa=3```
The new OSU machines are much slower than the old ones, Adam pointed out the compile time went from 18m to 54m.
Fails on jdk18 as well. We aren't testing this platform ATM with jdk19. https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.openjdk_aarch64_linux_Nightly/73/ - ub20-aarch64-osu-2
Created https://github.com/ibmruntimes/openj9-openjdk-jdk/pull/450 to capture a javacore in addition to the core we already get on timeout.
In case this isn't machine related, the changes from the previously working build. https://github.com/eclipse-openj9/openj9/compare/eeedfa4d2c4...792a28f15e3 - nothing relevant https://github.com/eclipse-openj9/openj9-omr/compare/e542d083538...f6cc0533cf7 - aarch vector changes here, not sure they are relevent
@knn-k
There are also some timeouts in internal testing that pre-date the code changes in the previous comment. https://github.com/eclipse-openj9/openj9/issues/15164
I wonder if this is a real problem that only shows up on slower machines. The internal builds were broken for over a week, so we may not have seen when the problem was first introduced. The test isn't new and I don't see it recently un-excluded, it should have been running and passing for some time.
that only shows up on slower machines
On slower machines or different processors.
This testcase was excluded in April 2020 because of timeouts. I enabled the testcase again August last year. https://github.com/eclipse-openj9/openj9/issues/9040
I am not surprised if this testcase fails with a timeout on a slower machine.
While the switch to slower machines can explain why it started failing last night externally, that doesn't explain why we just started seeing it internally (https://github.com/eclipse-openj9/openj9/issues/15164) when it's been working for many months.
With the updated timeout handler we get a javacore. I won't keep reporting the external machine failures, but the javacore for the timeout tonight does show a lot of time spent in the "MainThread" (and nowhere else) as could be expected.
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/205/
3XMTHREADINFO "MainThread" J9VMThread:0x00000000001CCA00, omrthread_t:0x0000FFFF94355D80, java/lang/Thread:0x000000009FC8E980, state:R, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0x16, isDaemon:false)
3XMJAVALTHRCCL jdk/internal/loader/ClassLoaders$AppClassLoader(0x000000009FC51250)
3XMTHREADINFO1 (native thread ID:0x244EEC, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000081)
3XMTHREADINFO2 (native stack address range from:0x0000FFFF7859E000, to:0x0000FFFF785DE000, size:0x40000)
3XMCPUTIME CPU usage total: 952.504055050 secs, current category="Application"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at java/util/stream/Streams$RangeLongSpliterator.forEachRemaining(Streams.java:233(Compiled Code))
4XESTACKTRACE at java/util/Spliterator$OfLong.forEachRemaining(Spliterator.java:775)
4XESTACKTRACE at java/util/stream/AbstractPipeline.copyInto(AbstractPipeline.java:522)
4XESTACKTRACE at java/util/stream/AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:512)
4XESTACKTRACE at java/util/stream/ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
4XESTACKTRACE at java/util/stream/ReduceOps$5.evaluateSequential(ReduceOps.java:258)
4XESTACKTRACE at java/util/stream/ReduceOps$5.evaluateSequential(ReduceOps.java:248)
4XESTACKTRACE at java/util/stream/AbstractPipeline.evaluate(AbstractPipeline.java:239(Compiled Code))
4XESTACKTRACE at java/util/stream/ReferencePipeline.count(ReferencePipeline.java:709)
4XESTACKTRACE at org/openjdk/tests/java/util/stream/CountLargeTest.testRefLarge(CountLargeTest.java:53)
1XMTHDSUMMARY Threads CPU Usage Summary
NULL =========================
NULL
1XMTHDCATINFO Warning: to get more accurate CPU times for the GC, the option -XX:-ReduceCPUMonitorOverhead can be used. See the user guide for more information.
NULL
1XMTHDCATEGORY All JVM attached threads: 959.993005000 secs
1XMTHDCATEGORY |
2XMTHDCATEGORY +--System-JVM: 7.032051000 secs
2XMTHDCATEGORY | |
3XMTHDCATEGORY | +--GC: 0.163603000 secs
2XMTHDCATEGORY | |
3XMTHDCATEGORY | +--JIT: 6.241224000 secs
1XMTHDCATEGORY |
2XMTHDCATEGORY +--Application: 952.960954000 secs
For comparison I ran the test on xlinux (internal fyre machine), and it took about 2m to run.
11.0.15
jdk_custom_0 Start Time: Thu Jun 2 20:45:18 2022 Epoch Time (ms): 1654227918061
jdk_custom_0 Finish Time: Thu Jun 2 20:47:07 2022 Epoch Time (ms): 1654228027144
Last nightly build
jdk_custom_0 Start Time: Thu Jun 2 20:51:47 2022 Epoch Time (ms): 1654228307810
jdk_custom_0 Finish Time: Thu Jun 2 20:53:47 2022 Epoch Time (ms): 1654228427413
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 94
model name : Intel Core Processor (Skylake, IBRS)
stepping : 3
microcode : 0x1
cpu MHz : 2199.996
cache size : 4096 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm cr
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds
bogomips : 4399.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
All the internal aarch64 machines are busy ATM. If I find one idle later I can try a similar manual test on this platform.
CountLargeTest took 20 minutes to finish in the previous run on cent8-aarch64-3: https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/203/testReport/
The number has been similar in older jobs -- 19 minutes on cent8-aarch64-2 in April: https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/175/testReport/
aarch vector changes here, not sure they are relevent
Those PRs added AArch64 vector instructions, but they are not used anywhere yet.
I ran the testcase in my local environment using three builds, and I see no particular performance degradation. See the results below.
May 31 nightly build (https://openj9-jenkins.osuosl.org/job/Build_JDK17_aarch64_linux_Nightly/214/) 3 min 40 sec
jdk_custom_0 Start Time: Fri Jun 3 16:19:03 2022 Epoch Time (ms): 1654240743393
jdk_custom_0 Finish Time: Fri Jun 3 16:22:43 2022 Epoch Time (ms): 1654240963064
June 1 nightly build (https://openj9-jenkins.osuosl.org/job/Build_JDK17_aarch64_linux_Nightly/215/) 3 min 41 sec
jdk_custom_0 Start Time: Fri Jun 3 16:24:35 2022 Epoch Time (ms): 1654241075833
jdk_custom_0 Finish Time: Fri Jun 3 16:28:16 2022 Epoch Time (ms): 1654241296616
JDK 17 v0.32.0 (https://github.com/ibmruntimes/semeru17-binaries/releases/tag/jdk-17.0.3%2B7_openj9-0.32.0) 3 min 46 sec
jdk_custom_0 Start Time: Fri Jun 3 16:29:39 2022 Epoch Time (ms): 1654241379494
jdk_custom_0 Finish Time: Fri Jun 3 16:33:25 2022 Epoch Time (ms): 1654241605395
Thanks. The 20m is for both variants of the test, which explains why it didn't timeout if each run is 10m each. I thought our regular machines were fast, but I guess your local test environment is much faster.
I still don't know why we got the timeouts internally. Is it possible that sometimes the JIT just doesn't optimize it as much?
the jdk19 build that timed out
job/Test_openjdknext_j9_sanity.openjdk_aarch64_linux_Personal/2 | machine | time |
---|---|---|
same | ub18-aarch64-4 | 16 and 11 (27 - 16) |
the 0.23 M1 build that timed out
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux/199 | machine | time |
---|---|---|
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_0/126 | cent7-aarch64-4 | 16 |
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_1/126 | ub18-aarch64-9 | 16 |
There was a grinder on the M1 build that didn't reproduce it (20 iterations each) ** same machine that timed out earlier
job | machine | time | avg |
---|---|---|---|
job/Grinder_iteration_0/1229 | ub18-aarch64-8 | 3h 29m | 10.45 |
job/Grinder_iteration_1/1116 | ub18-aarch64-2 | 3h 25m | 10.25 |
job/Grinder_iteration_2/1038 | cent7-aarch64-4** | 3h 18m | 9.9 |
job/Grinder_iteration_3/946 | ub18-aarch64-5 | 3h 19m | 9.95 |
job/Grinder_iteration_4/903 | cent7-aarch64-6 | 3h 27h | 10.35 |
Recent runs
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux/198 | machine | time |
---|---|---|
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_0/125 | ub18-aarch64-9 | 13 |
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_1/125 | ub18-aarch64-7 | 12 |
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux/200 | machine | time |
---|---|---|
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_0/127 | cent7-aarch64-5 | 10 |
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_1/127 | ub18-aarch64-2 | 12 |
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux/201 | machine | time |
---|---|---|
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_0/128 | cent7-aarch64-2 | 13 |
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_1/128 | cent7-aarch64-3 | 13 |
Another interesting question is why is this test only slow on jdk17+? Optimization problems with OpenJDK MH? It's the same test, running and passing on jdk11 in about 10m, while timing out on jdk17+ at 16m.
Yes, it is much faster with JDK 11 than with JDK 17 in my local environment.
June 1 JDK 11 nightly build (https://openj9-jenkins.osuosl.org/job/Build_JDK11_aarch64_linux_Nightly/182/) 1 min 42 sec
jdk_custom_0 Start Time: Mon Jun 6 09:36:01 2022 Epoch Time (ms): 1654475761216
jdk_custom_0 Finish Time: Mon Jun 6 09:37:43 2022 Epoch Time (ms): 1654475863715
It was 3 min 41 sec using the JDK 17 nightly build of the same day, as I reported above.
I took some perf data on an internal machine. It turned out that ~40% of cpu time was spent on patching the callsite of interface dispatch on recompilation. We cache receiver class and method entry address for interface calls. When the cache hits, the entry address is loaded from the cache and used for dispatch. When recompilation happens, the callsite is updated so that it branches to the updated (recompiled) method, but it seems that it is not updated when the interface call is dispatched via cached address.
I opened a draft PR to resolve it. https://github.com/eclipse-openj9/openj9/pull/15242
Significant improvement with PR #15242 for this testcase on AArch64 macOS as shown below. The gap is smaller, but JDK 11 is still faster than JDK 17:
Build | JDK 11 | JDK 17 | Gap (11 vs 17) |
---|---|---|---|
June 8 Nightly | 91 sec. | 157 sec. | x1.7 |
With PR #15242 | 36 sec. | 44 sec. | x1.2 |
Speedup by the PR | x2.5 | x3.6 |
Good job, @Akira1Saitoh !
AArch64 Linux results have also improved between June 8 and 9. They are all on ub20-aarch64-osu-x. I think we can close this issue.
Build | JDK 11 | JDK 17 |
---|---|---|
June 8 Nightly | 10 min. | 16 min. (timeout) |
June 9 Nightly | 3 min. 51 sec. | 4 min. 3 sec. |
Is it understood why jdk17 is slower than jdk11? Do we have an issue to address that?
Is the change low risk enough to put into the 0.33 release?
I haven't looked into the difference between JDK 11 and 17. I opened Issue #15301 for tracking the performance gap.
Is the change low risk enough to put into the 0.33 release?
It is four days since PR #15242 was merged, and there are no failures caused by that change AFAIK. I guess we can put it in 0.33.
Opened https://github.com/eclipse-openj9/openj9/pull/15314 for 0.33.
I found similar performance gaps with CountLargeTest between JDK 11 and 17 on other platforms. See #15301.
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/204 - ub20-aarch64-osu-1
https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/204/openjdk_test_output.tar.gz cores in the following locations, no javacore which would be useful. aqa-tests\TKG\output_16541328857881\jdk_util_0\work\java\util\stream\test\org\openjdk\tests\java\util\stream\CountLargeTest aqa-tests\TKG\output_16541328857881\jdk_util_1\work\java\util\stream\test\org\openjdk\tests\java\util\stream\CountLargeTest
jdk_util_0
-Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:+UseCompressedOops
java/util/stream/test/org/openjdk/tests/java/util/stream/CountLargeTest.javajdk_util_1
-Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-UseCompressedOops
java/util/stream/test/org/openjdk/tests/java/util/stream/CountLargeTest.java