Closed pshipton closed 5 years ago
@Mesbah-Alam @smlambert
I also note https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-osx_x86-64_cmprssptrs/8 found a cache sharedcc_jenkins
which is unrelated to the test, and destroyed it.
STF 01:16:10.256 - +------ Step 3 - Destroy Persistent Shared Classes Caches
STF 01:16:10.256 - | Destroy all persistent caches
STF 01:16:10.256 - |
STF 01:16:10.257 - Running command: /Users/jenkins/workspace/Test-extended.system-JDK8-osx_x86-64_cmprssptrs/openjdkbinary/j2sdk-image/jre/bin/../../bin/java -Xshareclasses:destroyAll
STF 01:16:10.257 - Redirecting stderr to /Users/jenkins/workspace/Test-extended.system-JDK8-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15453722074971/SharedClassesAPI_0/20181221-011606-SharedClassesAPI/results/3.SCC.stderr
STF 01:16:10.257 - Redirecting stdout to /Users/jenkins/workspace/Test-extended.system-JDK8-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15453722074971/SharedClassesAPI_0/20181221-011606-SharedClassesAPI/results/3.SCC.stdout
STF 01:16:10.265 - Monitoring processes: SCC
SCC stderr
SCC stderr Attempting to destroy all caches in cacheDir /Users/jenkins/javasharedresources/
SCC stderr
SCC stderr JVMSHRC806I Compressed references persistent shared cache "sharedcc_jenkins" has been destroyed. Use option -Xnocompressedrefs if you want to destroy a non-compressed references cache.
A few similar occurrences at https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-osx_x86-64_cmprssptrs/36/tapResults/.
STF 04:24:08.325 - Monitoring processes: WL1 WL2 WL3 WL4
STF 04:24:14.490 - **FAILED** Process WL3 ended with exit code (1) and not the expected exit code/s (0)
SharedClassesWorkloadTest_Softmx_IncreaseDecrease_0/20190102-042841-SharedClassesWorkloadTest_Softmx_IncreaseDecrease/results/4.jvm1.stderr
Failed to find class java/lang/Object in shared cache for class-loader id 0.
Stored class java/lang/Object in shared cache for class-loader id 0 with URL /Users/jenkins/workspace/Test-extended.system-JDK11-osx_x86-64_cmprssptrs/openjdkbinary/j2sdk-image/lib/modules (index 0).
Failed to find class java/lang/J9VMInternals in shared cache for class-loader id 0.
Stored class java/lang/J9VMInternals in shared cache for class-loader id 0 with URL /Users/jenkins/workspace/Test-extended.system-JDK11-osx_x86-64_cmprssptrs/openjdkbinary/j2sdk-image/lib/modules (index 0).
Failed to find class com/ibm/oti/vm/VM in shared cache for class-loader id 0.
The test does define a folder exclusive to the test in which it creates some caches. However, some of the use cases that the test implements seem to deal with not using that designated folder but instead use the default cache location: https://github.com/eclipse/openj9-systemtest/blob/956e2ac3f18e6c37c93d32b8fab79bc54d2594c3/openj9.test.sharedClasses.jvmti/src/test.sharedClasses.jvmti/net/openj9/stf/SharedClassesAPI.java#L60.
So, the fact that the test is finding caches unrelated to the test is something that, I suspect, is working by design.
found a cache sharedcc_jenkins which is unrelated to the test, and destroyed it.
The test does destroy all persistent and non-persistent caches from the default location, i.e., not the folder specific to the test:
https://github.com/eclipse/openj9-systemtest/blob/956e2ac3f18e6c37c93d32b8fab79bc54d2594c3/openj9.test.sharedClasses.jvmti/src/test.sharedClasses.jvmti/net/openj9/stf/SharedClassesAPI.java#L118
Hi Simon, since I was not involved in the original development of this test, do you recall anything as to why the step was added to destroy all caches in the setup stage of this test?
This is resulting in deletion of some caches that are completely unrelated to the test (e.g. sharedcc_jenkins
) and that may be important to the Jenkins slave machine on which the test is running. We need to find a more efficient "clean up" method for this test. @lumpfish
@Mesbah-Alam is there an outlook for fixing this? The SharedClassesAPI_0 test continues to fail on Windows, and likely osx, every night.
I recall that the shared classes tests had issues if caches had been left around from previous tests, but I don't know specifics. I think one issue was that if they were left lying around in unique test specific directories they simply accumulated over time with no means of clearing them, so it was not noticed until the test machine started to run out of resources.
Are the caches which the test is unable to destroy there for a reason? Has the default java behaviour changed so that shared classes is now there for any 'general' java process. If so then arbitrarily cleaning them up won't be tenable any more. If we are only concerned with the test aborting because the delete fails then one option would be to make the delete failure non-fatal.
@lumpfish - tests can definitely clean up the shared classes caches in the test specific directories. The problem arises when they try to delete shared classes caches from elsewhere - which include caches that tests fail to delete, e.g. CC stderr INFO: Attempting to delete cache: sharedcc_LOCAL SERVICE and return value from delete call was: -2
Can we restrict the tests to only clean up test-specific caches from the test-specific directory and may be delete only the caches that it creates outside of it (e.g. provide cache name in delete command)?
I guess since we just disabled shared classes by default, we won't see this problem any more until it gets re-enabled again for the next release.
Seems the caches continue to persist on the machines, although shared classes by default is disabled now. The test is still failing. https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-win_x86-64_cmprssptrs/129
If we can't fix the tests soon, we may need to clean up the machines @AdamBrousseau @jdekonin
I don't think it should be left as an either / or scenario, should be both happening: 1) work to fix tests 2) regular/automated machine cleanup.
What are the files/folders I need to cleanup? I will add it to the cleanup job.
~/javasharedresources/
?
Sounds right. Please check the machine(s) for a shared cache file containing sharedcc_LOCAL SERVICE
in the name.
By fixing the tests, what I understood from the discussion above is: update test logic so that it does not fail the test on the event of failure in cache clean up-- I.e., as @lumpfish mentioned above: " make the delete failure non-fatal" -- I am working on making this update.
Technically the test should fail if it can't delete the caches it created. Or it should only attempt to delete the caches it created, and continue to fail if it doesn't work.
Technically the test should fail if it can't delete the caches it created. Or it should only attempt to delete the caches it created, and continue to fail if it doesn't work.
This is 1 of 2 remaining failures (non osx) in the nightly builds for the 0.12 release.
@Mesbah-Alam what is the outlook for fixing?
@jdekonin @AdamBrousseau is it possible to clean the sharedcc_LOCAL SERVICE
shared cache from the machines? Or is this related to some other running process? I was assuming it is related to have shared classes enabled by default, but shared classes is no longer enabled by default in the latest builds.
I am currently testing the PR that updates all the tests to only destroy the test specific cache (instead of all).
@pshipton
DESTROY_FAILED_CURRENT_GEN_CACHE seems to be a test issue: https://ci.eclipse.org/openj9/view/Test/job/Test-extended.system-JDK8-win_x86/137/tapResults/
The SharedClassesCacheChecker receives it when it tries to delete the cache it owns itself : DefaultLocationGroupAccessJavaNoIterator
. https://github.com/eclipse/openj9-systemtest/issues/78 is opened to fix this.
This has been fixed via https://github.com/eclipse/openj9-systemtest/issues/78
The test has been running fine: https://ci.eclipse.org/openj9/view/Test/job/Test-extended.system-JDK8-win_x86/198/tapResults/
@pshipton - could you please close this issue at this point?
https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-win_x86/108 https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-win_x86-64_cmprssptrs/111
The test seems to have found a shared cache which is unrelated to the test. Perhaps the test should set a cache directory so it does not find unrelated cache files.