adoptium / aqa-tests

Home of test infrastructure for Adoptium builds
https://adoptium.net/aqavit
Apache License 2.0
129 stars 308 forks source link

vector issue on jdk17 win32 #3279

Open sophia-guo opened 2 years ago

sophia-guo commented 2 years ago

jdk/incubator/vector/Short256VectorTests.java.Short256VectorTests jdk/incubator/vector/ShortMaxVectorTests.java.ShortMaxVectorTests

Both tests failed the job with error:

09:03:43  test Short256VectorTests.divShort256VectorTestsMasked(short[-i * 5], short[cornerCaseValue(i)], mask[i % 2]): failure
09:03:43  java.lang.ArithmeticException: zero vector lane in dividend [32767, -32768, -32768, 32767, 1, 32767, -32768, -32768, 32767, 0, 32767, -32768, -32768, 32767, 1, 32767]
09:03:43    at jdk.incubator.vector/jdk.incubator.vector.AbstractVector.divZeroException(AbstractVector.java:494)
09:03:43    at jdk.incubator.vector/jdk.incubator.vector.ShortVector.lanewiseTemplate(ShortVector.java:615)
09:03:43    at jdk.incubator.vector/jdk.incubator.vector.Short256Vector.lanewise(Short256Vector.java:279)
09:03:43    at jdk.incubator.vector/jdk.incubator.vector.Short256Vector.lanewise(Short256Vector.java:41)
09:03:43    at jdk.incubator.vector/jdk.incubator.vector.ShortVector.lanewise(ShortVector.java:673)
09:03:43    at jdk.incubator.vector/jdk.incubator.vector.ShortVector.div(ShortVector.java:1349)
09:03:43    at Short256VectorTests.divShort256VectorTestsMasked(Short256VectorTests.java:1606)
09:03:43    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
09:03:43    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
09:03:43    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
09:03:43    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
09:03:43    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
09:03:43    at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599)
09:03:43    at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174)
09:03:43    at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46)

https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_extended.openjdk_x86-32_windows_testList_1/26/consoleFull

sophia-guo commented 2 years ago

Failures are not machine related. https://ci.adoptopenjdk.net/job/Grinder/3314/ https://ci.adoptopenjdk.net/job/Grinder/3317/

Tests passed with earlier releases https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.1%2B12/OpenJDK17U-jdk_x86-32_windows_hotspot_17.0.1_12.zip https://ci.adoptopenjdk.net/job/Grinder/3312/ https://ci.adoptopenjdk.net/job/Grinder/3313/

Both test cases haven't been updated since last release.

smlambert commented 2 years ago

Also running same test against a different vendor build (https://cdn.azul.com/zulu/bin/zulu17.32.13-ca-jdk17.0.2-win_i686.zip in https://ci.adoptopenjdk.net/job/Grinder/3343) to see how it behaves.

Edit: Grinder/3343 passes

sxa commented 2 years ago

@smlambert Vector test suite failed in a re-build of 17.0.1 at https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_extended.openjdk_x86-32_windows_testList_1/29/console

sxa commented 2 years ago

@smlambert Can you take a look at the output from the above job and confirm my evaluation here? If so we should think about what we want to do as the next action. e.g. We could get someone to attempt a rebuild using an old version of the temurin-build scripts and see if that has the same behaviour.

smlambert commented 2 years ago

We have published win32 jdk17 as is, given the tests that have regressed are not mainstream usage and this investigation will take time.

Working with 2 builds for a detailed comparison:

From a glance the builds are very similar (same size, believe to be building same source code from same tags, running same test material), taking a closer look with dumpbin to determine how they vary. Will report findings here as part of this investigation.

smlambert commented 2 years ago

Things we know:

sxa commented 2 years ago

The other thing that could have been different is which machine they were built on. If other avenues prove fruitless and we don'thave the information about which machine they were on, we could trying building it on each build machine and run the test against it ...

smlambert commented 2 years ago

Attaching dump files of dll's which when diffed indicate some things are different, many of the api-ms ones are pretty close to identical) others hold more differences. My thought was to reduce the testcase to a standalone, then do a 'binary search' approach of swapping out 1/2 the dlls, from a working binary to a failing binary and see how the testcase behaves, and keep narrowing it down that way. But ya, feel free to also pursue other approaches... (this is where the SSDF and SBOM info would be tremendously handy)... dumps.zip

sxa commented 2 years ago
Build machine test results
ibmcloud-2 Vector tests failed
azure-2 Vector tests failled
alibaba-1 Vector tests re-running (job 37 failed)
smlambert commented 2 years ago

Noting same tests fail on arm_linux (aarch32) noted in jdk17 triage & jdk18 triage, tracked under https://github.com/adoptium/aqa-tests/issues/2874.

smlambert commented 7 months ago

@sophia-guo - adding the 'more triage required' tag as this is a candidate for further scrutiny.

sxa commented 7 months ago

Noting same tests fail on arm_linux (aarch32)

Interesting - in which case that smacks of it being a generic 32-bit issue I feel, although from this issue it started failing between 17.0.1 and 17.0.2 but the arm32 one mentions crashes in 17+35.