OpenLiberty / ci.docker

Eclipse Public License 1.0
43 stars 59 forks source link

Ensure other user IDs besides 1001/default can be used to restore InstantOn #420

Closed tjwatson closed 1 year ago

tjwatson commented 1 year ago

Semeru Java is going to provide fixes to CRIU to allow us to restore using a different user ID besides the default/1001 ID when running an InstantOn application. Some changes to the liberty images may be necessary to make sure the writable areas needed by liberty are writable to other users. For example /logs/checkpoint directory that gets created at checkpoint time.

anjumfatima90 commented 1 year ago

@ymanton recommended to make the following changes in checkpoint.sh to make it work

#!/bin/bash

# hack to bump up the pid by 100
for i in {1..100}
do
    pidplus.sh
done

echo "Performing checkpoint --at=$1"
/opt/ol/wlp/bin/server checkpoint defaultServer --at=$1

rc=$?
# Find all directories in logs/ and output/ that the current user has read/write/execute permissions for
# and give the same permissions to the group.
find -L /logs /output -type d -readable -writable -executable -exec chmod g+rwx {} \;

# Find all files in logs/ and output/ that the current user has read/write permissions for
# and give the same permissions to the group.
find -L /logs /output -type f -readable -writable -exec chmod g+rw {} \;

exit $rc

This should provide the required permissions to /output/workarea/checkpoint and /logs/checkpoint directory.

tjwatson commented 1 year ago

Instead of doing the chmod during the runtime (checkpoint.sh). I think we should do it as part of setting up the directory setup in the Dockerfiles. For example in https://github.com/OpenLiberty/ci.docker/blob/307a93003b9d9acbd2eb64f9c1394752ccfcd95d/releases/latest/full/Dockerfile.ubuntu.openjdk17#L86-L113

ymanton commented 1 year ago

IIRC we ran into problems doing things earlier because some files only exist after the checkpoint is taken.

tjwatson commented 1 year ago

IIRC we ran into problems doing things earlier because some files only exist after the checkpoint is taken.

Yes, I confirmed that trying to create and change the permissions of the directories does not seem to fix the issue. I'm not sure why, but also not sure it is worth investigating more. I did confirm that the update to the checkpoint.sh script does work. I have PR #437 for that. I have confirmed this works against the latest Java 17 EA image of Java icr.io/appcafe/ibm-semeru-runtimes:open-17-ea-jdk-ubi-amd64 when restoring with a different user than the default 1001 user. I did that by doing a container run with --user 1002 when restoring, for example.

leochr commented 1 year ago

@tjwatson If the change to checkpoint.sh (from PR #437) won't break / negatively affect existing users of 23.0.0.3 and 23.0.0.6 images then we should consider adding the changes to those as well - since they are maintained and supported. Thanks.

tjwatson commented 1 year ago

@tjwatson If the change to checkpoint.sh (from PR #437) won't break / negatively affect existing users of 23.0.0.3 and 23.0.0.6 images then we should consider adding the changes to those as well - since they are maintained and supported. Thanks.

23.0.0.3 doesn't support InstantOn.

It will not work until Java has a new release with the support in criu to allow a different user to restore from the user that did the checkpoint. This is just preparing us for when the Java images get updated and we build with the new release of the Java images. It is unclear to me that we should backport such support back to the 23.0.0.6 release, even though when the Java images update then our weekly rebuilds will pick up the new Java and in theory it should work. But we would have to test that it works properly. Without the automated tests I am not confident in making such changes to multiple releases.

tjwatson commented 1 year ago

Closing, this has been fixed.