Open LongyuZhang opened 3 years ago
Is this failure intermittent or consistent ? Does it fail on Java 8 ?
FYI @knn-k
Is this failure intermittent or consistent ? Does it fail on Java 8 ?
The jdk11 Aarch64 pipeline was just enabled and only has a build so far. I tested with personal build and it passed, so I think it is intermittent. JDK 8 nightly has not been enabled due to machine resources, it also passed personal build.
@LongyuZhang Could you try using the same SDK as the test build (OpenJDK Runtime Environment Openj9 (build 11.0.9+8-202009282343)
)? The Ginder runs that you have is an older version from Adopt API (OpenJDK Runtime Environment Openj9 (build 11.0.9+8-202009252344)
)
@LongyuZhang Could you try using the same SDK as the test build (
OpenJDK Runtime Environment Openj9 (build 11.0.9+8-202009282343)
)? The Ginder runs that you have is an older version from Adopt API (OpenJDK Runtime Environment Openj9 (build 11.0.9+8-202009252344)
)
@llxia Thanks for the reminder, I have updated the SDK to the same version as the nightly build and tested on multiple machines. Only the same machine as nightly build (test-aws-ubuntu1804-armv8-1) also failed at Test 26 (I manually cancelled after hanging for 1 hour) , other machines (test-packet-ubuntu1604-armv8-2, test-aws-rhel76-armv8-2, test-aws-rhel76-armv8-4, test-packet-ubuntu1604-armv8-1) all passed the test, with links https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/4002 - 4005 . So it should be a machine issue.
FYI https://openj9.slack.com/archives/C8312LCV9/p1602551615048900
there’s a pthread_cond_signal bug affecting glibc 2.27. Worth being aware of this if unexplained deadlocks are occurring: https://sourceware.org/bugzilla/show_bug.cgi?id=25847
Re-occurred on GA jdk-11.0.9+11_openj9-0.23.0 : https://ci.adoptopenjdk.net/job/Test_openjdk11_j9_sanity.functional_aarch64_linux/19/consoleFull
Hi @andrew-m-leonard, I discussed with @llxia about the testSCCMLTests1_openj9_1 failure you mentioned above in https://ci.adoptopenjdk.net/job/Test_openjdk11_j9_sanity.functional_aarch64_linux/19/consoleFull, it fails with test 57-63, not the same test 26 failure in this issue. Test 57-63 was newly enabled by Hang Shao’s PR, and has already passed all night build #17 and #18.
Build #19, #20 and #22 you mentioned failed because they are not triggered by nightly, which causes the known issue of Functional testing uses wrong test material in release testing, for which Lan has a WIP PR, that has not been merged yet.
This happened twice last night, aarch64 and Windows: https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/1579#issuecomment-726595987 https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/1579#issuecomment-726630670
Can we get more debug added to getUnixPid() please? as how can that fail?
Can we get more debug added to getUnixPid() please? as how can that fail?
FYI As I recall getUnixPid() is a hack that reaches into the implementation using reflect to find the pid, as there was no API to get it in Java 8. I believe Java 11 does provide an API.
Has the test server (test-aws-ubuntu1804-armv8-1) been rebooted recently? If not, I want it to be rebooted to see whether the failure disappears or not.
@pshipton The logic here https://github.com/eclipse/openj9/blob/efdb86514d722cf83747d9d8badc449fe6121658/test/functional/cmdline_options_tester/src/Test.java#L415 will only work on jdk8 as UNIXProcess.java does not exist in jdk11+. For jdk11+ it should use the jdk11 API to get the pid.
The failure on jdk8 Windows is likely to be because the ProcessKiller logic failed to kill the process, Windows processes can be stubborn at being killed, we are seeing many orphaned testcase Processes in Windows. We are looking at adding some post-testcase Process cleanup to avoid this. It would be beneificial though in this situation if the proc.waitFor() did not wait forever for it to finish, as it won't! So maybe some arbitrary 30 minutes timeout or something? https://github.com/eclipse/openj9/blob/efdb86514d722cf83747d9d8badc449fe6121658/test/functional/cmdline_options_tester/src/Test.java#L227 As it stands it is causing the whole build pipeline to hang all night.....!
Created https://github.com/eclipse/openj9/issues/11196 for the getUnixPid() issue.
Created https://github.com/eclipse/openj9/issues/11197 for the cmdlinetests waiting forever.
Failure link:
https://ci.adoptopenjdk.net/job/Test_openjdk11_j9_sanity.functional_aarch64_linux/1
testSCCMLTests1_openj9_1 Test 26 failed with timeout issue for openjdk11 Aarch64.
Failure output (captured from console output)