adoptium / aqa-tests

Home of test infrastructure for Adoptium builds
https://adoptium.net/aqavit
Apache License 2.0
129 stars 308 forks source link

CRIU Pingperf adds test loop for SharedClasses validation #5541

Open LongyuZhang opened 2 weeks ago

LongyuZhang commented 2 weeks ago

Based on Issue https://github.com/eclipse-openj9/openj9/issues/20012, OpenLiberty utilizes the established shared classes for multiple servers, so we need to increase Pingperf test to loop this test several times inside the container to validate the built shared classes. FYI @tajila @llxia

LongyuZhang commented 1 week ago

Tested creating Pingperf checkpoint images with 0.46 release (grinder link) ,

then run these images inside podman container, with following commands multiple times

Output is:

sh-4.4$ /opt/ol/wlp/bin/server start
Starting server defaultServer.
CWWKE0953W: This version of Open Liberty is an unsupported early release version.
Server defaultServer started with process ID 1026.
sh-4.4$ /opt/ol/wlp/bin/server stop
Stopping server defaultServer.
Server defaultServer stopped.
sh-4.4$ /opt/ol/wlp/bin/server run defaultServer
[AUDIT   ] Launching defaultServer (Open Liberty 24.0.0.9-beta/wlp-1.0.92.cl240820240729-1903) on Eclipse OpenJ9 VM, version 17.0.12+7 (en_US)
[AUDIT   ] CWWKT0016I: Web application available (default_host): http://363c9c36ea31:9080/pingperf/
[AUDIT   ] CWWKC0452I: The Liberty server process resumed operation from a checkpoint in 0.061 seconds.
[AUDIT   ] CWWKZ0001I: Application pingperf started in 0.062 seconds.
[AUDIT   ] CWWKF0012I: The server installed the following features: [cdi-3.0, concurrent-2.0, jndi-1.0, jsonp-2.0, restfulWS-3.0, restfulWSClient-3.0, servlet-5.0].
[AUDIT   ] CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 0.067 seconds.
^C[AUDIT   ] CWWKE0085I: The server defaultServer is stopping because the JVM is exiting.
[AUDIT   ] CWWKE1100I: Waiting for up to 30 seconds for the server to quiesce.
[AUDIT   ] CWWKT0017I: Web application removed (default_host): http://363c9c36ea31:9080/pingperf/
[AUDIT   ] CWWKZ0009I: The application pingperf has stopped successfully.

Not able to reproduce the error. Wondering what extra tests we need to run to trigger the SCC?

tajila commented 1 week ago

How many iterations did you run?

tajila commented 1 week ago

@tjwatson FYI

LongyuZhang commented 1 week ago

How many iterations did you run?

~Around 10 iterations, I can increase to 50 to have a try.~ Tried iteration of start and stop 50 times, the same.

tajila commented 1 week ago

Did you have a link to the dockre files that you are using for the test?

LongyuZhang commented 1 week ago

We build semeru image first use https://raw.githubusercontent.com/ibmruntimes/semeru-containers/ibm/17/jdk/ubi/ubi9/Dockerfile.open.releases.full, then based on this image, we use https://github.com/OpenLiberty/ci.docker/blob/main/releases/latest/beta/Dockerfile.ubi.openjdk21 to build openliberty image. Then we build pingperf checkpoint on top of it. Detailed steps are in https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder/42970/consoleFull

tajila commented 1 week ago

@tjwatson Do you know what we are doing differently from Liberty testing?

tjwatson commented 1 week ago

@tjwatson Do you know what we are doing differently from Liberty testing?

Our automated testing does not use container images. Instead it starts and stops various servers that will be using the same shared classes cache. But we have various other reports of the scripts used to build an application image also failing. Like the configure.sh script which starts and stops the server many times.

llxia commented 1 week ago

@tjwatson Could you point us to the automated test that identified this issue? We’re interested in exploring the possibility of incorporating it into our testing pipeline to catch such issues earlier.

llxia commented 8 hours ago

@tjwatson could you provide us some more info? Thanks