Open pshipton opened 5 years ago
@Mesbah-Alam
Windows
JVMSHRC777W The softmx limit for shared cache usage is smaller than the minimum feasible value 3980759048 bytes.
08:16:49.391 0x1bd9a00 j9shr.1012 * ** ASSERTION FAILED ** at CompositeCache.cpp:6276: (((0 != _theca) && hasWriteMutex(currentThread) && (getTotalSize() >= softMaxBytes) && (softMaxBytes >= getUsedBytes())))
osx
JVMSHRC756W Failed to set group access permission on the shared cache file as requested by the 'groupAccess' sub-option.
JVMSHRC659E An error has occurred while opening shared memory
JVMSHRC336E Port layer error code = -393970
JVMSHRC337E Platform error message: shmget : Cannot allocate memory
JVMSHRC029E Not enough memory left on the system
JVMSHRC663I Error recovery: destroyed semaphore set with id=65622 associated with shared class cache.
Failed to set group access permission
sounds like a machine problem. Not sure if this was occurring in the passing runs since there is no results file.
We are getting the assert again https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-win_x86-64_cmprssptrs/146/
JVMSHRC777W The softmx limit for shared cache usage is smaller than the minimum feasible value 3980759048 bytes.
10:57:18.897 0xf39a00 j9shr.1012 * ** ASSERTION FAILED ** at CompositeCache.cpp:6276: (((0 != _theca) && hasWriteMutex(currentThread) && (getTotalSize() >= softMaxBytes) && (softMaxBytes >= getUsedBytes())))
@hangshao0 please help figure this out.
23:43:37.432892070 0x0000000000ee9a00 j9prt.58 Exit <j9shmem_stat exit - contents of controlFile C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches\javasharedresources\C290M4F1A64_DefinedLocationGroupAccessJavaWL4_G35 is corrupted
23:43:37.432899289 0x0000000000ee9a00 j9shr.2233 Exception * SH_OSCachesysv::initializeHeader: The softmx bytes passed in is too big, it is reset to the cache file size 0
The cache file size is 0 and has been corrupted.
There were updates made to the test by this PR yesterday: https://github.com/eclipse/openj9-systemtest/pull/79.
A 5x Grinder was run at OpenJ9 for it and it passed: https://ci.eclipse.org/openj9/view/Test/job/Test-Grinder/245/consoleFull
The grinder ran on https://ci.eclipse.org/openj9/computer/win2012r2-x86-1/
The release build, in which the test failed, ran on the same machine too : https://ci.eclipse.org/openj9/computer/win2012r2-x86-1/
GEN 00:16:02.085 - 5 execute Run java DefaultLocationGroupAccessJava: Start workload for DefaultLocationGroupAccessJavaWL1
GEN 00:16:02.085 - 6 execute Run java DefaultLocationGroupAccessJava: Start workload for DefaultLocationGroupAccessJavaWL2
GEN 00:16:02.085 - 7 execute Run java DefaultLocationGroupAccessJava: Start workload for DefaultLocationGroupAccessJavaWL3
GEN 00:16:02.085 - 8 execute Run java DefaultLocationGroupAccessJava: Start workload for DefaultLocationGroupAccessJavaWL4
...
...
GEN 00:16:02.085 - 27 execute Run java DefinedLocationGroupAccessJava: Start workload for DefinedLocationGroupAccessJavaWL1
GEN 00:16:02.085 - 28 execute Run java DefinedLocationGroupAccessJava: Start workload for DefinedLocationGroupAccessJavaWL2
GEN 00:16:02.085 - 29 execute Run java DefinedLocationGroupAccessJava: Start workload for DefinedLocationGroupAccessJavaWL3
GEN 00:16:02.085 - 30 execute Run java DefinedLocationGroupAccessJava: Start workload for DefinedLocationGroupAccessJavaWL4
Step 7 & 8 always pass and step 29 & 30 always fail. The only difference I notice is that a cacheDir parameter is passed to step 29 & 30.
23:43:37.432873001 0x0000000000ee9a00 j9shr.510 Exit <shmemOpenWrapper exiting, return code = 110, specLeng = 314572800
23:43:37.432879661 0x0000000000ee9a00 j9prt.56 Entry >j9shmem_stat entered, name = C290M4F1A64_DefinedLocationGroupAccessJavaWL4_G35
23:43:37.432892070 0x0000000000ee9a00 j9prt.58 Exit <j9shmem_stat exit - contents of controlFile C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches\javasharedresources\\C290M4F1A64_DefinedLocationGroupAccessJavaWL4_G35 is corrupted
j9shr.510 shows the cache file is created successfully (return code 110 means J9PORT_INFO_SHMEM_CREATED). But the trace point j9prt.58 - Trc_PRT_shmem_j9shmem_stat_Exit2 shows up right after that. It is triggered only when CreateFileW() returns NULL or INVALID_HANDLE_VALUE.
memHandle = CreateFileW(unicodePath, GENERIC_READ, (FILE_SHARE_READ | FILE_SHARE_WRITE), NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (unicodeBuffer != unicodePath) {
omrmem_free_memory(unicodePath);
}
if ((NULL == memHandle) || (INVALID_HANDLE_VALUE==memHandle)) {
Trc_PRT_shmem_j9shmem_stat_Exit2(sharedMemoryFullPath); <---------- triggered
omrmem_free_memory(sharedMemoryFullPath);
return -1;
}
So it means the cache file is created then immediately deleted.
I guess it might be related to the cacheDir being used in step 29 & 30. (C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches). The default cacheDir in Step 7 & 8 works fine.
I've created a PR to include the error code in the trace point: https://github.com/eclipse/openj9/pull/4463
https://github.com/eclipse/openj9-systemtest/issues/75#issuecomment-457631704
Also @Mesbah-Alam launched a 5x grinder (Windows JDK8) in internal Jenkins yesterday, all passed.
Perhaps the failure is machine specific? Grinders only run on a single machine.
Although Mesba says in the comment that the same machine was used.
If it fails in the nightly builds but not in a grinder, maybe one of the other tests is leaving a process running.
@hangshao0 - you mentioned about the default limit, in osx, for the maximum number of shared classes cache, that is allowed to be created - 3. Just curious, is the limit same on Windows too?
Just curious, is the limit same on Windows too?
No.
The length of path
"C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches\javasharedresources\\C290M4F1A64_DefinedLocationGroupAccessJavaWL4_G35"
is 262, which is greater than MAX_PATH(=260) on Windows. The cacheDir/cacheName is too long. The test needs to be updated to use a shorter cacheDir/cacheName. @Mesbah-Alam
The JVM should do a better job reporting this error.
@hangshao0 - I will have a pr with changed path for non-default cache location.
CacheDir used in grinder is shorter than the one used in builds. So it passed in grinder but failed in builds.
CacheDir in grinder:
C:\Users\jenkins\workspace\Test-Grinder\openjdk-tests\TestConfig\test_output_15483611858909\SharedClassesAPI_0\20190124-141950-SharedClassesAPI\results\caches
CacheDir in builds:
C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches
Path has been shortened in https://github.com/eclipse/openj9-systemtest/pull/80. They should now be less than MAX_PATH(=260).
This test is now passing on Windows and OSX
Windows JDK8:
22:20:49 STF 21:20:51.524 - Overall result: PASSED
22:20:49
22:20:49 SharedClassesAPI_0_PASSED
from https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-win_x86-64_cmprssptrs/152/consoleFull
Window JDK11: https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-win_x86-64_cmprssptrs/152/
OSX JDK8: https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-osx_x86-64_cmprssptrs/51/
OSX JDK11:
03:01:00 STF 00:00:59.275 - Overall result: PASSED
03:01:00
03:01:00 SharedClassesAPI_0_PASSED
from https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-osx_x86-64_cmprssptrs/68/consoleFull
The JVM should do a better job reporting this error.
@hangshao0 - do we have an issue opened for this?
do we have an issue opened for this?
I will create a pull request to fix this shortly.
I assume these are the result of #74 and #71
https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-win_x86-64_cmprssptrs/141/ https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-win_x86-64_cmprssptrs/137/
https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-osx_x86-64_cmprssptrs/36/ https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-osx_x86-64_cmprssptrs/53/