eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

extended.functional_x86-32_windows testOSMXBean* getTotalSwapSpaceSize() API returned -1 #7716

Open AdamBrousseau opened 4 years ago

AdamBrousseau commented 4 years ago

Failing on the internal nightly jdk8 win32 build. Seems to be fine on win64 across versions. 4 tests

All failing with he same error. Doesn't seem to be machine specific. 100% failure rate.

[2019-11-07T08:44:46.180Z] [TestOperatingSystemMXBean] [INFO]  ----------------------------------------------------------
[2019-11-07T08:44:46.180Z] [TestOperatingSystemMXBean] [INFO]  Starting the OperatingSystemMXBean usage API tests........
[2019-11-07T08:44:46.180Z] [TestOperatingSystemMXBean] [INFO]  ----------------------------------------------------------
[2019-11-07T08:44:46.180Z] [TestOperatingSystemMXBean] [INFO] Tests starting at: 1573116269230 ms.
[2019-11-07T08:44:46.180Z] [TestOperatingSystemMXBean] [INFO] Testing getTotalPhysicalMemory() & getFreePhysicalMemorySize() APIs
[2019-11-07T08:44:46.180Z] [TestOperatingSystemMXBean] [INFO] Testing getTotalSwapSpaceSize() & getFreeSwapSpaceSize() APIs
[2019-11-07T08:44:46.180Z] [TestOperatingSystemMXBean] [INFO] Tests finished at: 1573116269245 ms.
[2019-11-07T08:44:46.180Z] FAILED: runTestOSMXBean
[2019-11-07T08:44:46.180Z] java.lang.AssertionError: Error: getTotalSwapSpaceSize() API returned -1, test failed
[2019-11-07T08:44:46.180Z]  at org.testng.Assert.fail(Assert.java:96)
[2019-11-07T08:44:46.180Z]  at org.openj9.test.management.TestOperatingSystemMXBean.test_SwapSpaceSizeAPIs(TestOperatingSystemMXBean.java:227)
[2019-11-07T08:44:46.180Z]  at org.openj9.test.management.TestOperatingSystemMXBean.runTestOSMXBean(TestOperatingSystemMXBean.java:171)
[2019-11-07T08:44:46.180Z]  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2019-11-07T08:44:46.180Z]  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[2019-11-07T08:44:46.180Z]  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2019-11-07T08:44:46.180Z]  at java.lang.reflect.Method.invoke(Method.java:498)
[2019-11-07T08:44:46.180Z]  at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:124)
[2019-11-07T08:44:46.180Z]  at org.testng.internal.Invoker.invokeMethod(Invoker.java:580)
[2019-11-07T08:44:46.180Z]  at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:716)
[2019-11-07T08:44:46.180Z]  at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:988)
[2019-11-07T08:44:46.180Z]  at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
[2019-11-07T08:44:46.180Z]  at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
[2019-11-07T08:44:46.180Z]  at org.testng.TestRunner.privateRun(TestRunner.java:648)
[2019-11-07T08:44:46.180Z]  at org.testng.TestRunner.run(TestRunner.java:505)
[2019-11-07T08:44:46.180Z]  at org.testng.SuiteRunner.runTest(SuiteRunner.java:455)
[2019-11-07T08:44:46.180Z]  at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:450)
[2019-11-07T08:44:46.180Z]  at org.testng.SuiteRunner.privateRun(SuiteRunner.java:415)
[2019-11-07T08:44:46.180Z]  at org.testng.SuiteRunner.run(SuiteRunner.java:364)
[2019-11-07T08:44:46.180Z]  at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
[2019-11-07T08:44:46.180Z]  at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:84)
[2019-11-07T08:44:46.180Z]  at org.testng.TestNG.runSuitesSequentially(TestNG.java:1208)
[2019-11-07T08:44:46.180Z]  at org.testng.TestNG.runSuitesLocally(TestNG.java:1137)
[2019-11-07T08:44:46.180Z]  at org.testng.TestNG.runSuites(TestNG.java:1049)
[2019-11-07T08:44:46.180Z]  at org.testng.TestNG.run(TestNG.java:1017)
[2019-11-07T08:44:46.180Z]  at org.testng.TestNG.privateMain(TestNG.java:1354)
[2019-11-07T08:44:46.180Z]  at org.testng.TestNG.main(TestNG.java:1323)

Seeing as this passes on the external build I would assume it is something on machine config side. I will just need someone to tell help me figure out what to change.

pshipton commented 4 years ago

I expect the following call is failing https://github.com/eclipse/omr/blob/a960b4f24a3c9372d654579109a6b32954e182a2/port/win32/omrsysinfo.c#L785-L789

pshipton commented 4 years ago

Does the machine have any swap space configured?

AdamBrousseau commented 4 years ago

If I'm looking in the right spot, it would appear so. Internal machine #2 (Win2016) MyPC->Properties->Advanced system settings->Advanced (tab)->(Performance) Settings->Advanced (tab)->Change image vs external machine #3 (Win2012R2) image

AdamBrousseau commented 4 years ago

Reran internally on a 2012r2 machine and it passed. So it seems to be an issue with win2016. Adopt has both 2012 and 2016 but I don't believe they regularly run the functional suite. I also don't have permissions to run grinders there.

pshipton commented 4 years ago

Do we have win2016 in the farm? We could try the test there and double check it's working on farm machines.

pshipton commented 4 years ago

i.e. job_output.php?id=620517467

AdamBrousseau commented 4 years ago

Seems to work fine on 10, 2012, and 2016 on the legacy farm. build_id=432218

AdamBrousseau commented 4 years ago

Fails at Adopt on 2012 for the same error https://ci.adoptopenjdk.net/job/Grinder/1039/ But passes at Adopt on 2016 https://ci.adoptopenjdk.net/job/Grinder/1040/ So it seems to be more of a machine config issue rather than a Windows version issue.

pshipton commented 4 years ago
import java.lang.management.*;

public class Test {
        public static void main(String[] args) throws Throwable {
                com.ibm.lang.management.OperatingSystemMXBean osbean = (com.ibm.lang.management.OperatingSystemMXBean)ManagementFactory.getOperatingSystemMXBean();
                System.out.println(osbean.getTotalSwapSpaceSize());
        }
}

bin/java -Xtrace:print=omrport.585-590 Test

19:21:27.253*0x22158700         omrport.585      > omrsysinfo_get_memory_info: Function entered.
19:21:27.253 0x22158700         omrport.589    * - omrsysinfo_get_memory_info: Error: Failed collecting data.
19:21:27.253 0x22158700         omrport.590      < omrsysinfo_get_memory_info: Return code = -358.
19:21:27.409*0x22158700         omrport.585      > omrsysinfo_get_memory_info: Function entered.
19:21:27.409 0x22158700         omrport.589    * - omrsysinfo_get_memory_info: Error: Failed collecting data.
19:21:27.409 0x22158700         omrport.590      < omrsysinfo_get_memory_info: Return code = -358.

omrport.589 1 1 1 N Trc_PRT_sysinfo_get_memory_info_dataQueryFailed "* omrsysinfo_get_memory_info: Error: Failed collecting data." https://github.com/eclipse/omr/blob/a960b4f24a3c9372d654579109a6b32954e182a2/port/win32/omrsysinfo.c#L777-L782

The error code -358 comes from the OMR port library, so isn't helpful. Looking at https://docs.microsoft.com/en-us/windows/win32/api/pdh/nf-pdh-pdhcollectquerydata there are two possible return codes, PDH_INVALID_HANDLE and PDH_NO_DATA. I'm assuming it's not PDH_INVALID_HANDLE because the code works on other machines. PDH_NO_DATA shows: "The query may not contain data because the user is not running with an elevated token (see Limited User Access Support). "

vsebe commented 4 years ago

Add user to Performance Log Users group.

AdamBrousseau commented 4 years ago

Should this be documented in the OpenJ9 functional test requirements? cc @smlambert

smlambert commented 4 years ago

Seems like it is something to be captured in ansible playbooks for x32 win machines and if I understand it correctly it would be a problem for any user who calls getTotalSwapSpaceSize() on a x32 windows 2012 machine?

AdamBrousseau commented 4 years ago

Sorry, was getting this crossed with #7717 ignore my comment.

AdamBrousseau commented 4 years ago

Copied from https://github.com/eclipse/openj9/issues/7717#issuecomment-558718727

Root cause: Newer version of OpenSSH (7.9->8.0 see changelog [1], Portability section), which is installed during machine & Cygwin setup, has changes to the setup which further restrict the configuration and resulting default permissions. Specifically, the process now runs as SYSTEM user instead of an added cyg_server user (see now outdated setup instructions in [2]). Which has the effect of cascading less permissions down to the jenkins user when logged in via cygwin/ssh. Jenkins no longer has (by default) access to run WMI/WMIC or query the system for swap space (memory info). The "fix" is to give more permissions to the jenkins user.

To resolve this issue (#7716) getTotalSwapSpaceSize() API returned -1

To resolve #7717 getPID() returned Access Deined

The following settings are required in order to allow jenkins remote access to WMI:

Refs
  1. https://www.openssh.com/releasenotes.html
    * sshd(8): On Cygwin, run as SYSTEM where possible, using S4U for
    token creation if it supports MsV1_0 S4U Logon.
  2. https://docs.cloudbees.com/docs/cloudbees-jenkins-enterprise/latest/admin-guide/agents#_microsoft_windows
  3. From a Cygwin64 Terminal launched to run as an Administrator run the ssh-host-config script:
  4. Provide the following answers: a. Should StrictModes be used: yes b. Should privilege separation be used: yes c. new local account 'sshd': yes d. Do you want to install sshd as a service: yes e. Enter the value of CYGWIN for the daemon: f. Do you want to use a different name: no g. Create new privileged user account 'WIN-…​\cyg_server' (Cygwin name: 'cyg_server'): yes h. Please enter the password: (you will need to provide a password)
AdamBrousseau commented 4 years ago

My follow up question, is this something that should be documented at OMR since OMR testing will fail if these settings aren't applied? Also, if a thirdparty app tries to use the omrsysinfo_get_memory_info function on win32 it will fail if they are running via Cygwin-ssh without adding these extra perms.

lumpfish commented 3 years ago

This is also occurring at AdoptOpenJDK, but on win64_xl (the test is skipped on win64 because it specifies -Xnocompressedrefs) - e.g. https://ci.adoptopenjdk.net/job/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/63/. It does not fail on all test machines - see https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/1805

17:55:47  openjdk version "1.8.0_282-internal"
17:55:47  OpenJDK Runtime Environment (build 1.8.0_282-internal-202101051709-b07)
17:55:47  Eclipse OpenJ9 VM (build master-1910cfa3a, JRE 1.8.0 Windows Server 2012 R2 amd64-64-Bit 20210105_550 (JIT enabled, AOT enabled)
17:55:47  OpenJ9   - 1910cfa3a
17:55:47  OMR      - a9b64bdc8
17:55:47  JCL      - 722ab284 based on jdk8u282-b07)
19:46:46  ===============================================
19:46:46  Running test testOSMXBeanLocal_0 ...
19:46:46  ===============================================
19:46:46  testOSMXBeanLocal_0 Start Time: Tue Jan  5 19:46:41 2021 Epoch Time (ms): 1609876001610
19:46:46  "C:/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdkbinary/j2sdk-image\\bin\\java" -Xshareclasses:destroyAll; "C:/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdkbinary/j2sdk-image\\bin\\java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done";
19:46:46  JVMSHRC005I No shared class caches available
19:46:46  JVMSHRC005I No shared class caches available
19:46:46  cache cleanup done
19:46:46  variation: Mode100
19:46:46  JVM_OPTIONS:  -Xint -Xgcpolicy:optthruput -Xnocompressedrefs 
19:46:46  { itercnt=1; \
19:46:46  mkdir -p "C:/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdk-tests/\\TKG\\test_output_16098696244505\\testOSMXBeanLocal_0"; \
19:46:46  cd "C:/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdk-tests/\\TKG\\test_output_16098696244505\\testOSMXBeanLocal_0"; \
19:46:46  export PATH="/cygdrive/c/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdkbinary/openjdk-test-image/openj9:/cygdrive/c/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdkbinary/j2sdk-image/jre/bin/default:/cygdrive/c/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdkbinary/j2sdk-image/jre/bin/j9vm:/cygdrive/c/Program Files (x86)/Common Files/Oracle/Java/javapath:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Strawberry/perl/bin:/cygdrive/c/Program Files/Git/cmd:/cygdrive/c/Program Files/Java/jdk8u172-b11/bin:/cygdrive/c/apache-ant/apache-ant-1.10.3/bin:/cygdrive/c/Program Files (x86)/Windows Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program Files/Microsoft SQL Server/110/Tools/Binn:/cygdrive/c/Program Files (x86)/Microsoft SDKs/TypeScript/1.0:/cygdrive/c/Program Files/Microsoft SQL Server/120/Tools/Binn"; \
19:46:46  "C:/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdkbinary/j2sdk-image\\bin\\java"  -Xint -Xgcpolicy:optthruput -Xnocompressedrefs  \
19:46:46  -DRUNLOCAL=true \
19:46:46  -cp "C:/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdk-tests///..//jvmtest\\TestConfig\\resources;C:/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdk-tests/\\TKG\\lib\\testng.jar;C:/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdk-tests/\\TKG\\lib\\jcommander.jar;C:/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdk-tests///..//jvmtest\\functional\\JLM_Tests\\jlm_tests.jar" \
19:46:46  org.testng.TestNG -d "C:/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdk-tests/\\TKG\\test_output_16098696244505\\testOSMXBeanLocal_0" "C:/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdk-tests///..//jvmtest\\functional\\JLM_Tests\\testng.xml" \
19:46:46  -testnames testOSMXBean \
19:46:46  -groups level.extended \
19:46:46  -excludegroups d.*.win_x86-64,d.*.arch.x86,d.*.os.win,d.*.bits.64,d.*.generic-all; \
19:46:46  if [ $? -eq 0 ] ; then echo ""; echo "testOSMXBeanLocal_0""_PASSED"; echo ""; cd C:/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdk-tests/;  else echo ""; echo "testOSMXBeanLocal_0""_FAILED"; echo ""; fi; } 2>&1 | tee -a "C:/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdk-tests/\\TKG\\test_output_16098696244505\\TestTargetResult";
19:46:46  [IncludeExcludeTestAnnotationTransformer] [INFO] exclude file is C:/Users/jenkins/workspace/Test_openjdk8_j9_extended.functional_x86-64_windows_xl/openjdk-tests/\\TestConfig\\resources\\excludes\\latest_exclude_8.txt
19:46:46  ...
19:46:46  ... TestNG 6.14.2 by Cédric Beust (cedric@beust.com)
19:46:46  ...
19:46:46  
19:46:46  [TestOperatingSystemMXBean] [INFO]  ----------------------------------------------------------
19:46:46  [TestOperatingSystemMXBean] [INFO]  Starting the OperatingSystemMXBean usage API tests........
19:46:46  [TestOperatingSystemMXBean] [INFO]  ----------------------------------------------------------
19:46:46  [TestOperatingSystemMXBean] [INFO] Tests starting at: 1609876002323 ms.
19:46:46  [TestOperatingSystemMXBean] [INFO] Testing getTotalPhysicalMemory() & getFreePhysicalMemorySize() APIs
19:46:46  [TestOperatingSystemMXBean] [INFO] Testing getTotalSwapSpaceSize() & getFreeSwapSpaceSize() APIs
19:46:46  [TestOperatingSystemMXBean] [INFO] Tests finished at: 1609876002328 ms.
19:46:46  FAILED: runTestOSMXBean
19:46:46  java.lang.AssertionError: Error: getTotalSwapSpaceSize() API returned -1, test failed
19:46:46    at org.testng.Assert.fail(Assert.java:96)
19:46:46    at org.openj9.test.management.TestOperatingSystemMXBean.test_SwapSpaceSizeAPIs(TestOperatingSystemMXBean.java:227)
19:46:46    at org.openj9.test.management.TestOperatingSystemMXBean.runTestOSMXBean(TestOperatingSystemMXBean.java:171)
19:46:46    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
19:46:46    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
19:46:46    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
19:46:46    at java.lang.reflect.Method.invoke(Method.java:498)
19:46:46    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:124)
19:46:46    at org.testng.internal.Invoker.invokeMethod(Invoker.java:580)
19:46:46    at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:716)
19:46:46    at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:988)
19:46:46    at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
19:46:46    at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
19:46:46    at org.testng.TestRunner.privateRun(TestRunner.java:648)
19:46:46    at org.testng.TestRunner.run(TestRunner.java:505)
19:46:46    at org.testng.SuiteRunner.runTest(SuiteRunner.java:455)
19:46:46    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:450)
19:46:46    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:415)
19:46:46    at org.testng.SuiteRunner.run(SuiteRunner.java:364)
19:46:46    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
19:46:46    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:84)
19:46:46    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1208)
19:46:46    at org.testng.TestNG.runSuitesLocally(TestNG.java:1137)
19:46:46    at org.testng.TestNG.runSuites(TestNG.java:1049)
19:46:46    at org.testng.TestNG.run(TestNG.java:1017)
19:46:46    at org.testng.TestNG.privateMain(TestNG.java:1354)
19:46:46    at org.testng.TestNG.main(TestNG.java:1323)
19:46:46  
19:46:46  
19:46:46  ===============================================
19:46:46      testOSMXBean
19:46:46      Tests run: 1, Failures: 1, Skips: 0
19:46:46  ===============================================
pshipton commented 3 years ago

@lumpfish see https://github.com/eclipse/openj9/issues/7716#issuecomment-558719129 which describes how to fix the machine. i.e. To resolve this issue (#7716) getTotalSwapSpaceSize() API returned -1

Add jenkins user to Performance Log Users group. See AdoptOpenJDK/openjdk-infrastructure#1009