Open cjjdespres opened 4 months ago
Attn @mpirvu.
Could you please add some context. Is this a new failure that started recently? is this a new test? Does it happen only Java11, or is it seen with other versions of Java as well?
It started failing recently - I looked at that particular configuration (JDK 11, x86_64) and saw one failure here that I had missed, but nothing before that. I haven't seen this on other platforms or other Java versions yet.
The test does not appear to be recent - I'm not sure exactly where it's defined, but I saw it mentioned in an issue from 2018.
A 100x grinder using the nightly build from May 15 passed. Version information:
10:52:18 openjdk version "11.0.24-internal" 2024-07-16
10:52:18 OpenJDK Runtime Environment (build 11.0.24-internal+0-adhoc..BuildJDK11x86-64linuxjitPersonal)
10:52:18 Eclipse OpenJ9 VM (build master-40817e8, JRE 11 Linux amd64-64-Bit Compressed References 20240515_1845 (JIT enabled, AOT enabled)
10:52:18 OpenJ9 - 40817e8
10:52:18 OMR - b9a6ccc
10:52:18 JCL - 795d069 based on jdk-11.0.24+2)
The earliest failure I saw was the May 31st failure that I noted in https://github.com/eclipse-openj9/openj9/issues/19610#issuecomment-2147606051, and that had version information:
# OpenJDK Runtime Environment (build 11.0.24-internal+0-adhoc..BuildJDK11x86-64linuxjitPersonal)
# Eclipse OpenJ9 VM (build master-ff7726e, JRE 11 Linux amd64-64-Bit Compressed References 20240531_1861 (JIT enabled, AOT enabled)
# OpenJ9 - ff7726e
# OMR - 6463b76
# JCL - bea3662 based on jdk-11.0.24+5)
This test completes quite quickly, so I can probably bisect the failure and narrow it down to a single day.
Some mixed results from grinders:
There might have been a change between May 31 and June 3 that made this more likely to occur. I'll run more tests, but at least for the May 15-31 test period (so changes from May 14-30) the failure is more intermittent, if present at all.
I repeated the bisection with 500x grinders.
The latest successful version (0/500 failures, May 27 nightly run)
OpenJDK Runtime Environment (build 11.0.24-internal+0-adhoc..BuildJDK11x86-64linuxjitPersonal)
Eclipse OpenJ9 VM (build master-ce04759, JRE 11 Linux amd64-64-Bit Compressed References 20240527_1857 (JIT enabled, AOT enabled)
OpenJ9 - ce04759
OMR - e32d21b
JCL - b3e5f08 based on jdk-11.0.24+4)
and the earliest failing version (2/500 failures, May 28 nightly run)
openjdk version "11.0.24-internal" 2024-07-16
OpenJDK Runtime Environment (build 11.0.24-internal+0-adhoc..BuildJDK11x86-64linuxjitPersonal)
Eclipse OpenJ9 VM (build master-4ff10c9, JRE 11 Linux amd64-64-Bit Compressed References 20240528_1858 (JIT enabled, AOT enabled)
OpenJ9 - 4ff10c9
OMR - 01374a4
JCL - b3e5f08 based on jdk-11.0.24+4)
The ranges are ce04759...4ff10c9 and https://github.com/eclipse/omr/compare/e32d21b...01374a4
In the range given in https://github.com/eclipse-openj9/openj9/issues/19610#issuecomment-2152982016, there is the PR https://github.com/eclipse-openj9/openj9/pull/19545 that re-enables the IProfiler for JITServer clients. If that is the cause, then IProfiler being disabled may have masked the appearance of the failure. The PR https://github.com/eclipse-openj9/openj9/pull/19355 (merged in bfaeffa721246eeed84345ce1dea45ad264385d7 on May 7) was what disabled the IProfiler. The commit e92c6946102af608ff0711987f538331e4a9a290 re-enabled it, so that commit can be cherry-picked to re-enable IProfiler at other times in the repo history.
Some preliminary bisection results, with e92c6946102af608ff0711987f538331e4a9a290 cherry-picked onto the indicated openj9 commit:
That possibly narrows it down to 2da7e0a283c65b1ce458621737c2f0f61bfcc3ca...ec6dc8e0b598e870d1c70388e0a1d4df45ea118e and https://github.com/eclipse/omr/compare/dd4c5d5459a1e2a890f18b68d5c6b7b105b658d0...e32d21b256728bc90598eb68fddfebbf52f8b740.
The failure rate in grinder seems to be pretty low, which makes me less confident in the lower bound of these ranges. It's possible that the initial 11/100 failure grinder I linked to in https://github.com/eclipse-openj9/openj9/issues/19610#issue-2331885111 was simply very unlucky in terms of number of failures, as the rate has usually been 2/500 in subsequent testing.
This did fail again in https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal/1019/.
I'm also not sure where this test is defined. Edit - this is an internal test.
I got a failure (1/475, though one iteration had an infra failure halfway through) with 0e57b7a83a824e024f7c5f4e8f6eba69a85401d8 and https://github.com/eclipse/omr/commit/5faa3ec5fa4166271c7c9ac24acdb3eb508fee3b, and got 0/500 failures with 0fcba08e6f9e21add71098f96c65abd61147473d and https://github.com/eclipse/omr/commit/5faa3ec5fa4166271c7c9ac24acdb3eb508fee3b. (EDIT - those builds were based on the indicated commit, with https://github.com/eclipse-openj9/openj9/commit/e92c6946102af608ff0711987f538331e4a9a290 cherry-picked on top).
That's the range 0fcba08e6f9e21add71098f96c65abd61147473d...0e57b7a83a824e024f7c5f4e8f6eba69a85401d8, so it's possible it's https://github.com/eclipse-openj9/openj9/pull/19523, but I'm still not very confident about the lower bound on the range. I'll grind a recent build to see the current failure rate, and also a build with those changes reverted.
The test seems to measure speed, though I might be wrong. If that's the case, is this a matter of JITServer performance rather than a functional bug?
Yes, the comments in the test code seem to indicate that it's a performance test. There's a TestHashTableCorrectness
test defined in the same directory, which seems like the correctness-testing counterpart to TestHashTablePerformance
.
I used a build starting with https://github.com/eclipse/omr/commit/befee22 and 293f2b7812795bdf5ca611bc832bf5a1dc3d052a (two recent-ish commits), but with 0cb4aa5951855c28b6e6fc61e3d8570b6ff220a7 reverted, and still got failures in this test, so I have no idea what the problem is.
Failure link
https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk11_j9_sanity.functional_x86-64_linux_jit_Personal/1678/
Optional info
Failure output (captured from console output)