Open adamfarley opened 4 months ago
Here are a couple of grinders for this test, each one a 100x grinder of this specific unit test.
Also, for the last 12 runs of this test target, here are the pass/fails:
Pass test-docker-debian11-ppc64le-1 test-docker-debian11-ppc64le-3 x2 test-docker-ubuntu2204-ppc64le-1 test-docker-ubuntu2204-ppc64le-2 test-osuosl-ubuntu2004-ppc64le-1 test-osuosl-ubuntu1604-ppc64le-1 x2 test-skytap-ubuntu2004-ppc64le-1
Fail test-docker-ubuntu1804-ppc64le-1 test-osuosl-ubuntu1804-ppc64le-2 x2
Ok, the Grinders were conclusive.
100/100 of the tests on test-osuosl-ubuntu2004-ppc64le-1 passed. 100/100 of the tests on test-osuosl-ubuntu1804-ppc64le-1 failed with exactly this issue.
This test was rewritten in JDK 9 and allows the call count to be off by one. That might allow us to pass this test on this one special platform.
For our own purposes I wouldn't consider this test failure a blocker so there wouldn't be any priority action to resolve it.The rewrite was 10 years ago now... Maybe it would be nice to backport the adjustment to the assert 8?
Head: https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/test/hotspot/jtreg/compiler/uncommontrap/Test8009761.java#L284 Fix was to allow the count to be off by one, lol..
Good catch Jie. :)
Looks like the fix here was associated with the issue here. It was meant to be backported to JDK8, but got deferred and forgotten.
Will exclude for now, and test/backport the fix after the release is resolved. Assigning to my best guesstimate of the correct post-release iteration, so I don't forget to handle this.
Master exclusion: https://github.com/adoptium/aqa-tests/pull/5228 Will be cherry picked and merged into latest release branch once approved+merged. (Update: Done here)
After the release, the fix for this will be backported in Q2 Iteration 3.
Hi, How can I reproduce this failure, because I do not a ppc64le_linux machine.
Hi @sendaoYan - look for an invitation to a test-triage team. Once you accept the invitation, you can try running these Grinder jobs on Jenkins:
If you need direct access to either test machine, you would raise an infrastructure issue to request it.
Hi @sendaoYan - look for an invitation to a test-triage team. Once you accept the invitation, you can try running these Grinder jobs on Jenkins:
- Grinder job on 1804 node - runs on test-osuosl-ubuntu1804-ppc64le-1
- Grinder job on 2004 node - runs on test-osuosl-ubuntu2004-ppc64le-1
If you need direct access to either test machine, you would raise an infrastructure issue to request it.
Thanks.
Upstream bug raised here: https://bugs.openjdk.org/browse/JDK-8330973 Upstream PR raised here: https://github.com/openjdk/jdk8u-dev/pull/487
Severin has asked for a backport of the full JDK9 fix, rather than the minimal version, so I'm checking the backport to make sure it's clean.
Looking into the creation of the backport PR now.
Commit generated here. Testing underway. Links to follow.
https://ci.adoptium.net/job/Grinder/9898/console
Note: May need re-launching if the relative test path is incorrect.
Resuming this task. Here's a grinder rerun: https://ci.adoptium.net/job/Grinder/10554/
Update: Test run passed. Creating upstream PR and updating the associated bug.
Update: The upstream PR has been merged into jdk8u dev. Will unexclude this test once the change gets merged into jdk8u.
Summary
Test8009761.java fails whenever it is run on a ppc64le_linux Ubuntu 1804 machine, but appears to pass everywhere else.
Example: https://trss.adoptium.net/output/test?id=64b83aef17052c671586e467 Deep History: https://trss.adoptium.net/deepHistory?testId=6615e317879917006efa06ea
Details
From 2023-07-19 (possibly earlier) to now, Test8009761 has always failed with this error when it is run on one of our ppc64le_linux Ubuntu 1804 machines, such as test-osuosl-ubuntu1804-ppc64le-2.
OS' that this test is proved to pass on include:
As far as I can tell, Test8009761 has been failing for many years, but was ignored/excluded due to an issue affecting many compiler tests (link) that affected it on all ppcle machines.
It seems this issue was resolved here, and then unexcluded here on 2023-02-07. Records between there and 2023-07-19 are not present, so I don't think we can be sure when the "init recursive" problem started, as the stack issue may have concealed it by causing (a) failure prior to the recursive issue, or (b) causing so many failures on the non-ubuntu-1804 machines that the recursive issues were drowned out.
Also, here are some examples of past failures of this test, and how they were fixed:
I'm not currently seeing an unresolved upstream bug that looks identical to this issue.
Machine stats
In case this was not an OS issue, but rather a throughput issue, I pulled out some stats for the failing/passing machines. I don't see a pattern, sadly. See below for the numbers, in case someone has another theory.