Closed CristianZatt closed 2 years ago
Hmm. Thanks for the report! I'll set up a VM and try this out.
This seems to be caused by our older CRIU version without https://github.com/checkpoint-restore/criu/pull/1727. I'll try to merge recent changes to the CRaC's CRIU.
Could you check out a preliminary build? https://github.com/AntonKozlov/openjdk-builds/actions/runs/2737755453 If it works fine, I'll integrate changes to the next CRaC build.
Sure, will test when I get home after work and let you know.
FYI:
I also get segmentation fault with the prior build (17.0+2) (build 17-crac+2-10)
.
With the jvm 17-crac+0-9
build from the link above, I no longer get segmentation fault but it isn't successful. I instead see:
CR: Checkpoint ...
JVM: invalid info for restore provided (may be failed checkpoint)
And trying to run it:
$JAVA_HOME/bin/java -XX:CRaCRestoreFrom=cr
Error (criu/protobuf.c:72): Unexpected EOF on (empty-image)
$ export JAVA_HOME=~/apps/crac/jdk-crac-9
$ mvn --version
Apache Maven 3.8.5 (3599d3414f046de2324203b78ddcf9b5e4388aa0)
Maven home: /home/rob/app/maven
Java version: 17-crac, vendor: N/A, runtime: /home/rob/apps/crac/jdk-crac-9
Default locale: en_NZ, platform encoding: UTF-8
OS name: "linux", version: "5.4.0-122-generic", arch: "amd64", family: "unix"
$ mvn clean package
...
$ sudo rm -rf cr
$ $JAVA_HOME/bin/java -XX:CRaCCheckpointTo=cr -jar target/example-jetty-1.0-SNAPSHOT.jar
2022-07-29 11:46:25.070:INFO::main: Logging initialized @120ms to org.eclipse.jetty.util.log.StdErrLog
2022-07-29 11:46:25.123:INFO:oejs.Server:main: jetty-9.4.30.v20200611; built: 2020-06-11T12:34:51.929Z; git: 271836e4c1f4612f12b7bb13ef5a92a927634b0d; jvm 17-crac+0-9
2022-07-29 11:46:25.155:INFO:oejs.AbstractConnector:main: Started ServerConnector@4aa8f0b4{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2022-07-29 11:46:25.158:INFO:oejs.Server:main: Started @218ms
2022-07-29 11:46:41.576:INFO:oejs.AbstractConnector:Thread-9: Stopped ServerConnector@4aa8f0b4{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
CR: Checkpoint ...
JVM: invalid info for restore provided (may be failed checkpoint)
2022-07-29 11:46:42.072:INFO:oejs.Server:Thread-9: jetty-9.4.30.v20200611; built: 2020-06-11T12:34:51.929Z; git: 271836e4c1f4612f12b7bb13ef5a92a927634b0d; jvm 17-crac+0-9
2022-07-29 11:46:42.074:INFO:oejs.AbstractConnector:Thread-9: Started ServerConnector@4aa8f0b4{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2022-07-29 11:46:42.074:INFO:oejs.Server:Thread-9: Started @17134ms
With curl localhost:8080
in there after the server is started and the $JAVA_HOME/bin/jcmd target/example-jetty-1.0-SNAPSHOT.jar JDK.checkpoint
to trigger the checkpoint.
Could you check and post cr/dump4.log? If it contains messages about insufficient permissions, then just try to re-extract jdk with sudo
. https://github.com/CRaC/docs#jdk.
then just try to re-extract jdk with sudo. https://github.com/CRaC/docs#jdk.
That was my problem. I didn't extract with sudo!! All working now:
$JAVA_HOME/bin/java -XX:CRaCCheckpointTo=cr -jar target/example-jetty-1.0-SNAPSHOT.jar
2022-07-29 23:23:19.924:INFO::main: Logging initialized @86ms to org.eclipse.jetty.util.log.StdErrLog
2022-07-29 23:23:19.967:INFO:oejs.Server:main: jetty-9.4.30.v20200611; built: 2020-06-11T12:34:51.929Z; git: 271836e4c1f4612f12b7bb13ef5a92a927634b0d; jvm 17-crac+0-9
2022-07-29 23:23:19.998:INFO:oejs.AbstractConnector:main: Started ServerConnector@4aa8f0b4{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2022-07-29 23:23:20.000:INFO:oejs.Server:main: Started @172ms
2022-07-29 23:23:37.192:INFO:oejs.AbstractConnector:Thread-9: Stopped ServerConnector@4aa8f0b4{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
CR: Checkpoint ...
Killed
And then
$JAVA_HOME/bin/java -XX:CRaCRestoreFrom=cr
2022-07-29 23:24:55.718:INFO:oejs.Server:Thread-9: jetty-9.4.30.v20200611; built: 2020-06-11T12:34:51.929Z; git: 271836e4c1f4612f12b7bb13ef5a92a927634b0d; jvm 17-crac+0-9
2022-07-29 23:24:55.721:INFO:oejs.AbstractConnector:Thread-9: Started ServerConnector@4aa8f0b4{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2022-07-29 23:24:55.721:INFO:oejs.Server:Thread-9: Started @95893ms
Awesome.
Side note: I own a few libraries that could support CRaC - My jdbc connection pool already has offline()/online()
support. I have an abstraction over Jetty + a dependency injection library ... so I can certainly play around with adding in the lifecycle support needed for this.
Yep, worked here on my VM too. Will try to apply it on some of our applications to see if we can get a good result for faster deploys on the development environment.
Nice project! Is there any road map or perspective to make this production ready?
Just to put out my own thoughts re getting more prod ready:
@PostConstruct
methods and ensuring that is where all external environment stuff is read (external config isn't read in constructors). The alternative is to setup a sort of "canary prod" instance to create the checkpoint in prod. Hmmm.@rbygrave @CristianZatt Thanks for checking! I'll include changes in the future CRaC build.
@CristianZatt There is no schedule yet for production readiness. We need to comply with the OpenJDK quality standards, which cannot be done quickly, unfortunately.
@rbygrave It would be interesting to hear about experience with CRaC API. Org.crac library may help to use the API in real-world projects. https://github.com/CRaC/docs#orgcrac
Compression is indeed useful and requested feature. We are prioritizing that to be done sooner. We have that implemented by external scripts in the deployment of https://github.com/CRaC/example-lambda and planning how to do that in the VM.
Regarding pre-prod vs canary environment, checkpoint in the canary environment may provide better results as the profile and compilations will be captured by the image. Would that work for you?
Tried out this example on Ubuntu 22.04 VM, used both, jdk17-crac+1 and jdk17-crac+2. All fine until "$JAVA_HOME/bin/java -XX:CRaCRestoreFrom=cr" resulting in Segmentation fault and being unable to restore the application.
Also tried on my application, following the step-by-step guide and the result was the same. Image files generated, but unable to restore due to segmentation fault.
Used maven 3.6.3 running on the provided 17-crac JDKs to package.
On the presentations, it was restored by running a restore.sh script, is there something missing from the documentation, or is there a problem currently to restore the applications?