Open lumpfish opened 3 years ago
Seems likely related to the memory on those machines. Next steps should probably be to verify the swap file settings, whether they can be increased with any effect, and if not we should look to increase the RAM on those systems to 6GB first, then 8GB if that doesn't work.
Seems likely related to the memory on those machines. Next steps should probably be to verify the swap file settings, whether they can be increased with any effect, and if not we should look to increase the RAM on those systems to 6GB first, then 8GB if that doesn't work.
Could also be filehandles.
Could also be filehandles.
What determines available file handles on a per-machine basis? Is that in any way a default set on RAM size or something else?
(I've disabled the win2016 system by removing ci.role.test
until this can be debugged/diagnosed)
Could also be filehandles.
What determines available file handles on a per-machine basis? Is that in any way a default set on RAM size or something else?
On Windows? I've actually got no idea.
Testing here with swap space increased on test-azure-win2016-x64-1 (assuming it goes live without a reboot) If that doesn't work I'll increase the RAM to 6Gb
Hmmm 2012r2-2 has 16GB of RAM. Running a Grinder on there too to verify
So the Grinder on the win2016 box failued but not with an obvious memory issue - @lumpfish can you check the log of that one to see if it's the same issue you've seen?
The win2012r2 did give an OutOfMemoryException
- have made sure there is up to 12GB of swap and am re-running in this grinder
Win2012 machine showed an OutOfMemory
during one of the tests (different one in each run) in 7231 and 7237 I'm going to restart it, run the same test again while trying to watch the usage live on the machine and then see how easy it is to increase to 6GB ([EDIT: no I won't as Azure doens't have 6GB options so it'll have to be 8GB which is almost twice the cost unfortunately ... Maybe I'll just shut down the 2012 one and bump the 2016 up to 8GB B2ms
spec)
So the Grinder on the win2016 box failued but not with an obvious memory issue - @lumpfish can you check the log of that one to see if it's the same issue you've seen?
That test is similar in that it runs multiple jvms in parallel which share a shared class cache.
The stderr from the failing process (found by downloading the system_test_output.tar.gz file from the failing job (https://ci.adoptopenjdk.net/job/Grinder/7230/) ) contains:
JVMSHRC162E The wait for the creation mutex while opening shared memory has timed out
JVMSHRC662I Error recovery: destroyed semaphore set associated with shared class cache.
JVMSHRC840E Failed to start up the shared cache.
JVMJ9VM015W Initialization error for library j9shr29(11): JVMJ9VM009E J9VMDllMain failed
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
I've not seen (or noticed) that before.
Hmmm https://ci.adoptopenjdk.net/job/Grinder/7260/ ran through without any failure on azure-win2012r2-2 after an earlier reboot.
Although trying again and this has popped up: Upgrade time then! (FYI @smlambert looks like Windows tests can't complete on a 4GB Windows system)
I've shut the Windows2012 machine down (it's also more expnsive than the new ones I've set up so shutting it down isn't a bad idea). I'm re-running a Grinder on the 2016 machine 7268 since the previous one passed, and I'll look to bumping it up to 8Gb if it fails (Will still be cheaper than the Win2012 one) [EDIT: 7268 passed - running again on the 4GB Win2016 box at 7277 and 7278
Side note: I'm also running a grinder on one of the larger 2012 boxes at 7269 - mostly because I'm curious as to whether there are any performance differences on that one (But I suspect on the system test suites it won't make much difference)
7277 failed a test but did not through a visible OutOfMemory
error so inconclusive
7277 failed with the same mutex wait error:
JVMSHRC162E The wait for the creation mutex while opening shared memory has timed out
JVMSHRC662I Error recovery: destroyed semaphore set associated with shared class cache.
JVMSHRC840E Failed to start up the shared cache.
JVMJ9VM015W Initialization error for library j9shr29(11): JVMJ9VM009E J9VMDllMain failed
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Despite the above tests being inconclusive due to the failure on shared class setup, I'm going to go ahead with
Converted test-azure-win2016-x64-1 from B2s
(left) to B2ms
(right). Back online with ci.role.test
label and queued up two Grinders 7288 and 7299 - hopefully that will resolve the OutOfMemoryError
s if not the class cache issue.
I'm going to deprovision https://ci.adoptopenjdk.net/computer/test-azure-win2012r2-x64-2/ (test-2012r2-2
on the azure portal) - we can recreate it if required in the future but it's unfit for purpose in its current state and cannot easily be converted to a cost effective larger system.
7288 failed but https://ci.adoptopenjdk.net/job/Grinder/7301/ succeeded - @lumpfish can you take a look at 7288 and let me know if you're concerned about the failure (in terms of whether it could still be a machine specific one-off)
7288 (https://ci.adoptopenjdk.net/job/Grinder/7288/console) looks like it failed with a Jenkins connect issue?
Updated links to re-run:
We don't run impl=openj9 tests in adoptium , so can win2016 be enabled?
The following openj9 shared classed test targets may fail when they land on test-azure-win2012r2-x64-2 or test-azure-win2016-x64-1.
The symptoms are various out of memory exceptions - e.g.
Their Jenkins links show the machines have 4Gb RAM: https://ci.adoptopenjdk.net/computer/test-azure-win2012r2-x64-2/ - Failed https://ci.adoptopenjdk.net/computer/test-azure-win2016-x64-1/ - Failed
The links for two other machines also show them as having 4Gb memory, but the tests pass on those machines: https://ci.adoptopenjdk.net/computer/test-azure-win2012r2-x64-1/ - Passed https://ci.adoptopenjdk.net/computer/test-azure-win2012r2-x64-3/ - Passed