Open kgibm opened 1 year ago
I don't think this should have a release bug for something that is still in beta.
Did you have any additional message about criu not having permissions?
@sebratton Could you have a look. The same thing should happen if you run in-container without specifying the necessary --add-cap
options.
Did you have any additional message about criu not having permissions?
$ oc logs libertydiag-596cc64bdd-rdj62
/opt/ol/wlp/bin/server: line 1458: /opt/ol/wlp/output/defaultServer/workarea/checkpoint/restoreTime: Permission denied
/opt/ol/wlp/bin/server: line 1474: /opt/ol/wlp/output/defaultServer/workarea/checkpoint/.env.properties: Permission denied
/opt/ol/wlp/bin/server: line 1413: /usr/sbin/criu: Operation not permitted
CWWKE0957I: Restoring the checkpoint server process failed. Check the /logs/checkpoint/restore.log log to determine why the checkpoint process was not restored. Launching the server without using the checkpoint image.
JVMSHRC226E Error opening shared class cache file
JVMSHRC336E Port layer error code = -102
JVMSHRC337E Platform error message: Permission denied
JVMSHRC840E Failed to start up the shared cache.
JVMSHRC686I Failed to startup shared class cache. Continue without using it as -Xshareclasses:nonfatal is specified
Launching defaultServer (Open Liberty 22.0.0.13-beta/wlp-1.0.71.cl221220221107-1900) on Eclipse OpenJ9 VM, version 17.0.5-ea+8 (en_US)
Confirmed similar results with podman:
Launch container:
$ podman run --rm quay.io/ibm/libertydiag:instanton
/opt/ol/wlp/bin/server: line 1413: /usr/sbin/criu: Operation not permitted
CWWKE0957I: Restoring the checkpoint server process failed. Check the /logs/checkpoint/restore.log log to determine why the checkpoint process was not restored. Launching the server without using the checkpoint image.
Launching defaultServer (Open Liberty 22.0.0.13-beta/wlp-1.0.71.cl221220221107-1900) on Eclipse OpenJ9 VM, version 17.0.5-ea+8 (en_US) [...]
$ podman exec -it $(podman ps -q) ls -l /logs/checkpoint/
total 8
-rw-------. 1 default root 1657 Dec 5 21:13 checkpoint.log
-rw-r-----. 1 default root 54 Dec 5 21:13 stats-dump
/opt/ol/wlp/bin/server: line 1458: /opt/ol/wlp/output/defaultServer/workarea/checkpoint/restoreTime: Permission denied /opt/ol/wlp/bin/server: line 1474: /opt/ol/wlp/output/defaultServer/workarea/checkpoint/.env.properties: Permission denied
These are weird, we should always have write access to the work area. Is there something that makes the container file system read-only here? I would think that would cause "normal" liberty application containers to fail.
/opt/ol/wlp/bin/server: line 1413: /usr/sbin/criu: Operation not permitted
This is the error I was expecting from trying to invoke criu
without the necessary capabilities.
Is there something that makes the container file system read-only here?
Probably related to folder permissions. The user running the container is 1000830000
but the checkpoint
folder is owned by default:root
. I'm not sure where 1000830000
is coming from as I didn't specify it.
$ oc exec libertydiag-596cc64bdd-rdj62 -- whoami
1000830000
$ oc exec libertydiag-596cc64bdd-rdj62 -- grep -e default -e 1000830000 /etc/passwd
default:x:1001:0::/home/default:/usr/sbin/nologin
1000830000:x:1000830000:0:1000830000 user:/:/sbin/nologin
$ oc exec libertydiag-596cc64bdd-rdj62 -- ls -l /opt/ol/wlp/output/defaultServer/workarea/
total 20
drwxr-x---. 1 default root 22 Dec 5 21:13 checkpoint
-rw-r-----. 1 1000830000 root 15004 Dec 12 20:29 equinox.log
drwxrwx---. 1 default root 4096 Dec 12 20:31 org.eclipse.osgi
drwxr-x---. 3 1000830000 root 117 Dec 12 20:30 platform
Running the vanilla container shows it running with the default
user:
$ podman run --rm quay.io/ibm/libertydiag:instanton whoami
default
@jgawor do you know what is going on here with the users on OCP?
Describe the bug
Running an InstantOn image in OpenShift without proper privileges shows the
CWWKE0957I
error but the referenced/logs/checkpoint/restore.log
file does not exist.Steps to Reproduce
oc create deployment libertydiag --image=quay.io/ibm/libertydiag:instanton
CWWKE0957I
:restore.log
:Expected behavior
/logs/checkpoint/restore.log
should exist with details of the InstantOn restore failure.Diagnostic information:
FROM icr.io/appcafe/open-liberty:beta-instanton
IBM Semeru Runtime Open Edition (17.0.5-ea+8)
Additional context
@tjwatson fyi