eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

Liberty 24.0.0.11 InstantOn restore fail on Amazon EKS #20416

Open tam512 opened 4 hours ago

tam512 commented 4 hours ago

Test Liberty InstantOn restore on Amazon EKS.
App checkpoint image was built on stg.icr.io/cp/olc/open-liberty-vnext:24.0.0.11-full-java21-openj9-ubi-minimal.
Restore the checkpoint app image on EKS and it fails to restore with the following errors in restore.log

Warn  (criu/kerndat.c:1153): $XDG_RUNTIME_DIR not set. Cannot find location for kerndat file
Error (criu/sockets.c:210): sockets: Diag module missing (-2)
Warn  (criu/kerndat.c:1153): $XDG_RUNTIME_DIR not set. Cannot find location for kerndat file
Error (criu/cr-restore.c:1325): Can't open -1/sys/kernel/ns_last_pid on procfs: Read-only file system
Error (criu/cr-restore.c:1451): Setting PID failed
Error (criu/cr-restore.c:2557): Restoring FAILED.

criu4-logs.zip

github-actions[bot] commented 4 hours ago

Issue Number: 20416 Status: Open Recommended Components: comp:vm, comp:build, comp:gc Recommended Assignees: tajila, jasonfengj9, keithc-ca

pshipton commented 4 hours ago

@tajila

tajila commented 3 hours ago

@ymanton Please take a look at this

ymanton commented 2 hours ago

Looks like the clone3 system call is blocked via seccomp. Seccomp needs to be disabled or a custom seccomp profile that allows clone3 needs to be used.

tam512 commented 2 hours ago

I added the following to app deploy yaml as @ymanton suggested on slack discussion and the app can restore ok

securityContext:
    seccompProfile:
        type: Unconfined

Can we improve the error message so users know how to fix the problem?