adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
85 stars 101 forks source link

JDK native sanity test fails on test-azure-win2012r2-x64-1 only #964

Closed adam-thorpe closed 4 years ago

adam-thorpe commented 4 years ago

When running on this azure machine, the test seems to fail constantly, across multiple versions (JDK 11 and 13) and x64/x32/large heap. Doesn't seem to fail on any of the other machines. I would assume azure 2 and 3 would produce the same output but they are currently offline.

See: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/825/ Rebuild: https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=11&JDK_IMPL=openj9&BUILD_LIST=openjdk&JenkinsFile=openjdk_x86-64_windows&TARGET=jdk_native_sanity_0&LABEL=test-azure-win2012r2-x64-1

Test: native_sanity/simplenativelauncher/ProgramTest.java

13:21:22  STDERR:
13:21:22   stdout: [];
13:21:22   stderr: []
13:21:22   exitValue = 255
13:21:22  
13:21:22  java.lang.RuntimeException: Expected to get exit value of [0]
13:21:22  
13:21:22    at jdk.testlibrary.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:375)
13:21:22    at ProgramTest.main(ProgramTest.java:40)
13:21:22    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
13:21:22    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
13:21:22    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
13:21:22    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
13:21:22    at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
13:21:22    at java.base/java.lang.Thread.run(Thread.java:831)
13:21:22  
13:21:22  JavaTest Message: Test threw exception: java.lang.RuntimeException
13:21:22  JavaTest Message: shutting down test
karianna commented 4 years ago

https://github.com/AdoptOpenJDK/openjdk-tests/issues/1396

sxa commented 4 years ago

Last comment is different, but this is issue appears to be the same (and has a trss link which shows which machines it's passed/failed on): https://github.com/AdoptOpenJDK/openjdk-tests/issues/1471

adam-thorpe commented 4 years ago

Seems I forgot to exclude this when I made the issue, which is why it is still failing in tests. I will put in a merge request for it later.

lumpfish commented 4 years ago

If we exclude it it won't run anywhere - i.e. not on the non-azure machines where it runs successfully. Better to find out why it fails on the azure machines.

adam-thorpe commented 4 years ago

True, but the common approach is to raise an issue and exclude the test so that the nightly tests are green (and therefore easier to spot newer failures). The bug will be dealt with and tracked by this issue and then once resolved can be un-excluded.

smlambert commented 4 years ago

@adam-thorpe is correct, the expectation is that test is excluded, some individual takes ownership of the issue, makes a change in their branch to re-include the test as they investigate and 'grind' the test on various machines, making notes and updates in the issue. When they arrive at the solution (which in this case may be to correct the configuration of the azure machine config), they can merge their 're-inclusion' of the test.

Willsparker commented 4 years ago

@adam-thorpe I've rebuilt on Grinder, using your link and @smlambert 's instructions from here: https://adoptopenjdk.slack.com/archives/C5219G28G/p1577978195006500?thread_ts=1577959133.006000&cid=C5219G28G

And it appeared to work, on both test-azure-win2012r2-x64-1 and -3 : https://ci.adoptopenjdk.net/job/Grinder/1566/ https://ci.adoptopenjdk.net/job/Grinder/1567/

adam-thorpe commented 4 years ago

Thats my bad, because the test was excluded, supplying native_sanity as the target wont run the test. Recreated here: https://ci.adoptopenjdk.net/job/Grinder/1584/console

Willsparker commented 4 years ago

The rebuild of 1584 worked on test-azxure-win2012r2-x64-3: https://ci.adoptopenjdk.net/job/Grinder/1587/

so it only affects -1 as far as we can test.

adam-thorpe commented 4 years ago

Okay so this test compiles some very basic c++ code which should print "hello" and exit with code 0. Instead error code 255 is returned. After doing some digging it would seem that this is a common error code when c++ code fails to compile. @Willsparker can you check the VS versions are all installed correctly on this machine?

Willsparker commented 4 years ago

Looking over the test on the machine, and running ./sanity_SimpleNativeLauncher.exe has output this:

Screenshot 2020-01-07 at 15 52 10

and VSRUNTIME140.dll doesn't appear in the C:/Windows/System32 not either of the Program Files folder. Comparing the azure-1 to the azure-3 one, MSVS2017 hasn't been installed on azure-1, So I'll do that and see if that fixes the issue.

adam-thorpe commented 4 years ago

Its missing MSVS 14.0 which is 2015 is it not?

sxa commented 4 years ago

Aaaaaaahhhhhh ...

I was trying to figure out how this was failing as I had assumed it was built within the job, but that .exe comes from the testimage artefact so was built on the build machine with a version of the compiler that isn't on the test machine.

So yeah chucking the extra Visual Studio version on the machine should resolve this. I guess azure-1 was set up before we started installing the later version.

adam-thorpe commented 4 years ago

Ah that makes sense. And yeah Will figured out that azure 2 and 3 were added later so thats why they have the right version and 1 doesn't

Willsparker commented 4 years ago

Theres no harm in installing both anyway :-)

Willsparker commented 4 years ago

https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1654/

Installed MSVS2017 and it passed on test-azure-win2012r2-1. Closing issue :-)