adoptium / aqa-tests

Home of test infrastructure for Adoptium builds
https://adoptium.net/aqavit
Apache License 2.0
128 stars 308 forks source link

JDK8: ppc64le_linux: Test8009761.java fails with: init recursive calls: 38. After deopt 37 #5221

Open adamfarley opened 4 months ago

adamfarley commented 4 months ago

Summary

Test8009761.java fails whenever it is run on a ppc64le_linux Ubuntu 1804 machine, but appears to pass everywhere else.

[2023-07-19T16:35:21.776Z] STDOUT:
[2023-07-19T16:35:21.776Z] CompilerOracle: exclude Test8009761.m2
[2023-07-19T16:35:21.776Z] Failed: init recursive calls: 38. After deopt 37

Example: https://trss.adoptium.net/output/test?id=64b83aef17052c671586e467 Deep History: https://trss.adoptium.net/deepHistory?testId=6615e317879917006efa06ea

Details

From 2023-07-19 (possibly earlier) to now, Test8009761 has always failed with this error when it is run on one of our ppc64le_linux Ubuntu 1804 machines, such as test-osuosl-ubuntu1804-ppc64le-2.

OS' that this test is proved to pass on include:

As far as I can tell, Test8009761 has been failing for many years, but was ignored/excluded due to an issue affecting many compiler tests (link) that affected it on all ppcle machines.

It seems this issue was resolved here, and then unexcluded here on 2023-02-07. Records between there and 2023-07-19 are not present, so I don't think we can be sure when the "init recursive" problem started, as the stack issue may have concealed it by causing (a) failure prior to the recursive issue, or (b) causing so many failures on the non-ubuntu-1804 machines that the recursive issues were drowned out.

Also, here are some examples of past failures of this test, and how they were fixed:

I'm not currently seeing an unresolved upstream bug that looks identical to this issue.

Machine stats

In case this was not an OS issue, but rather a throughput issue, I pulled out some stats for the failing/passing machines. I don't see a pattern, sadly. See below for the numbers, in case someone has another theory.

Data test-osuosl-ubuntu1804-ppc64le-2 (fail) test-osuosl-ubuntu2004-ppc64le-1 test-skytap-ubuntu2004-ppc64le-1 test-docker-debian11-ppc64le-1 test-osuosl-ubuntu1604-ppc64le-1
Free Physical Memory Size 4750508032 278855680 387776512 4056809472 5065408512
Free space (bytes) 30003261440 15444140032 409540165632 309474676736 55724167168
Total Physical Memory Size 8556511232 4252565504 8488419328 6442450944 8559984640
Total space (bytes) 84449624064 41551020032 422622445568 422548791296 83203571712
Usable space (bytes) 26508455936 15427362816 388065382400 287999893504 55707389952
cpuCores 4 2 32 32 4
adamfarley commented 4 months ago

Here are a couple of grinders for this test, each one a 100x grinder of this specific unit test.

Also, for the last 12 runs of this test target, here are the pass/fails:

adamfarley commented 4 months ago

Ok, the Grinders were conclusive.

100/100 of the tests on test-osuosl-ubuntu2004-ppc64le-1 passed. 100/100 of the tests on test-osuosl-ubuntu1804-ppc64le-1 failed with exactly this issue.

jiekang commented 4 months ago

This test was rewritten in JDK 9 and allows the call count to be off by one. That might allow us to pass this test on this one special platform.

For our own purposes I wouldn't consider this test failure a blocker so there wouldn't be any priority action to resolve it.The rewrite was 10 years ago now... Maybe it would be nice to backport the adjustment to the assert 8?

JDK 8: https://github.com/openjdk/jdk8u-dev/blob/cde8aca6cb0fae77b9300b9d65d094a4f74e4d53/hotspot/test/compiler/8009761/Test8009761.java#L249

Head: https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/test/hotspot/jtreg/compiler/uncommontrap/Test8009761.java#L284 Fix was to allow the count to be off by one, lol..

adamfarley commented 4 months ago

Good catch Jie. :)

Looks like the fix here was associated with the issue here. It was meant to be backported to JDK8, but got deferred and forgotten.

Will exclude for now, and test/backport the fix after the release is resolved. Assigning to my best guesstimate of the correct post-release iteration, so I don't forget to handle this.

adamfarley commented 4 months ago

Master exclusion: https://github.com/adoptium/aqa-tests/pull/5228 Will be cherry picked and merged into latest release branch once approved+merged. (Update: Done here)

After the release, the fix for this will be backported in Q2 Iteration 3.

sendaoYan commented 4 months ago

Hi, How can I reproduce this failure, because I do not a ppc64le_linux machine.

smlambert commented 4 months ago

Hi @sendaoYan - look for an invitation to a test-triage team. Once you accept the invitation, you can try running these Grinder jobs on Jenkins:

If you need direct access to either test machine, you would raise an infrastructure issue to request it.

sendaoYan commented 4 months ago

Hi @sendaoYan - look for an invitation to a test-triage team. Once you accept the invitation, you can try running these Grinder jobs on Jenkins:

If you need direct access to either test machine, you would raise an infrastructure issue to request it.

Thanks.

adamfarley commented 4 months ago

Upstream bug raised here: https://bugs.openjdk.org/browse/JDK-8330973 Upstream PR raised here: https://github.com/openjdk/jdk8u-dev/pull/487

adamfarley commented 4 months ago

Severin has asked for a backport of the full JDK9 fix, rather than the minimal version, so I'm checking the backport to make sure it's clean.

adamfarley commented 3 months ago

Looking into the creation of the backport PR now.

Commit generated here. Testing underway. Links to follow.

https://ci.adoptium.net/job/Grinder/9898/console

Note: May need re-launching if the relative test path is incorrect.

adamfarley commented 1 month ago

Resuming this task. Here's a grinder rerun: https://ci.adoptium.net/job/Grinder/10554/

Update: Test run passed. Creating upstream PR and updating the associated bug.

adamfarley commented 1 day ago

Update: The upstream PR has been merged into jdk8u dev. Will unexclude this test once the change gets merged into jdk8u.