checkpoint-restore / checkpointctl

A tool for in-depth analysis of container checkpoints
Apache License 2.0
87 stars 15 forks source link

Add support to convert checkpoint archives to OCI images #125

Open Parthiba-Hazra opened 3 months ago

Parthiba-Hazra commented 3 months ago

Resolves: #52

This PR introduce a new command checkpointctl build, which can be used to convert checkpoint archives to OCI images. The command accepts a checkpoint archive and a image name as input and generates an OCI image suitable for use with container runtimes like CRI-O or Podman.

➜  ~ checkpointctl build /var/lib/kubelet/checkpoints/checkpoint-nginx-deployment-85bfd45d84-kfxmm_default-nginx-2024-03-29T11:32:17+05:30.tar test-checkpointctl:v1 

2024/04/03 19:05:26 Created checkpoint image: 7b3135033935815023dcdc810419ee1894621436b4d7622440691e4197db451c
2024/04/03 19:05:26 Image 'test-checkpointctl:v1' created successfully from checkpoint '/var/lib/kubelet/checkpoints/checkpoint-nginx-deployment-85bfd45d84-kfxmm_default-nginx-2024-03-29T11:32:17+05:30.tar'
➜  ~ podman images
REPOSITORY                                 TAG              IMAGE ID      CREATED         SIZE
docker.io/library/test-checkpointctl       v1               7b3135033935  33 seconds ago  15.8 MB

we can inspect the image using podman -

➜  ~  podman image inspect 7b3135033935                           
[
     {
          "Id": "7b3135033935815023dcdc810419ee1894621436b4d7622440691e4197db451c",
          "Digest": "sha256:08ee0bc5036915f52b4a2d7a1a9746302cd4010322ef609b121407589b476078",
          "RepoTags": [
               "docker.io/library/test-checkpointctl:v1"
          ],
          "RepoDigests": [
               "docker.io/library/test-checkpointctl@sha256:08ee0bc5036915f52b4a2d7a1a9746302cd4010322ef609b121407589b476078"
          ],
          "Parent": "",
          "Comment": "",
          "Created": "2024-04-03T13:35:26.175477795Z",
          "Config": {},
          "Version": "",
          "Author": "",
          "Architecture": "amd64",
          "Os": "linux",
          "Size": 15841874,
          "VirtualSize": 15841874,
          "GraphDriver": {
               "Name": "overlay",
               "Data": {
                    "UpperDir": "/var/lib/containers/storage/overlay/fc8fdb8e1d92bfb6fd9d5e448a90993389fc3837f5ba3e226602d3c7b6de2e71/diff",
                    "WorkDir": "/var/lib/containers/storage/overlay/fc8fdb8e1d92bfb6fd9d5e448a90993389fc3837f5ba3e226602d3c7b6de2e71/work"
               }
          },
          "RootFS": {
               "Type": "layers",
               "Layers": [
                    "sha256:fc8fdb8e1d92bfb6fd9d5e448a90993389fc3837f5ba3e226602d3c7b6de2e71"
               ]
          },
          "Labels": null,
          "Annotations": {
               "io.container.manager": "CRI-O",
               "io.kubernetes.cri-o.annotations.checkpoint.name": "nginx",
               "io.kubernetes.cri-o.annotations.checkpoint.namespace": "default",
               "io.kubernetes.cri-o.annotations.checkpoint.pod": "nginx-deployment-85bfd45d84-kfxmm",
               "io.kubernetes.cri-o.annotations.checkpoint.rootfsImage": "docker.io/library/nginx@sha256:52478f8cd6a142fd462f0a7614a7bb064e969a4c083648235d6943c786df8cc7",
               "io.kubernetes.cri-o.annotations.checkpoint.rootfsImageID": "92b11f67642b62bbb98e7e49169c346b30e20cd3c1c034d31087e46924b9312e",
               "io.kubernetes.cri-o.annotations.checkpoint.rootfsImageName": "docker.io/library/nginx:latest",
               "io.kubernetes.cri-o.annotations.checkpoint.runtime.name": "runc",
               "org.opencontainers.image.base.digest": "",
               "org.opencontainers.image.base.name": ""
          },
          "ManifestType": "application/vnd.oci.image.manifest.v1+json",
          "User": "",
          "History": [
               {
                    "created": "2024-04-03T13:35:26.253908385Z",
                    "created_by": "/bin/sh"
               }
          ],
          "NamesHistory": [
               "docker.io/library/test-checkpointctl:v1"
          ]
     }
]
github-actions[bot] commented 3 months ago

Test Results

55 tests  ±0   55 :white_check_mark: ±0   1s :stopwatch: ±0s  1 suites ±0    0 :zzz: ±0   1 files   ±0    0 :x: ±0 

Results for commit 1f4c74ce. ± Comparison against base commit 6c3a2639.

:recycle: This comment has been updated with latest results.

Parthiba-Hazra commented 3 months ago

I will add test cases gradually.

adrianreber commented 3 months ago

Thanks for this PR. It looks like the right approach. Unfortunately, now that the PR exists, I am no longer sure this is right thing to do.

This PR pulls in so many dependencies (over 3000 files), that maybe this was not a good idea after all.

Parthiba-Hazra commented 3 months ago

@adrianreber yes I noticed that also, I am not getting any solution of this problem right now, do you have any advise for me ?

rst0git commented 3 months ago

This PR pulls in so many dependencies (over 3000 files), that maybe this was not a good idea after all.

I am not getting any solution of this problem right now, do you have any advise for me ?

It is definitely not a good idea to add all these dependencies. A much simpler solution would be to use something like exec.Command() and run the buildah binary instead.

rst0git commented 3 months ago

This PR introduce a new command checkpointctl convert

What about using checkpointctl build instead of "convert" as it creates an OCI image similar to podman build and docker build?

adrianreber commented 3 months ago

Not sure about exec. Maybe it should just be a shell script. Not sure.

Parthiba-Hazra commented 3 months ago

Not sure about exec. Maybe it should just be a shell script. Not sure.

If we use shell script, then where should we store it(maybe in $HOME/.checkpointctl/scripts) ? and I don't think embedding the script into the go code as string and then writing them out to temporary files or executing them directlty from memory is a good idea.

adrianreber commented 3 months ago

No, if we offer a shell script it should follow what make install does.

It could also be somewhere in libexec and then be called by exec as suggested by Radostin.

Parthiba-Hazra commented 3 months ago

Now if we use script, we have to ensure that user have buildah installed or we can add buildah installation as a part of checkpointctl installation.

adrianreber commented 3 months ago

Now if we use script, we have to ensure that user have buildah installed or we can add buildah installation as a part of checkpointctl installation.

Yes, we will document it and the actual dependencies is up to the package managers in the distributions. The script can also check for buildah and exit with an error message if it is not there.

Parthiba-Hazra commented 2 months ago

@adrianreber @rst0git PTAL, If there is no major refactoring required, I can start working on tests.

adrianreber commented 2 months ago

Why does the PR now still add over 16000 lines of code? Something is not right.

adrianreber commented 2 months ago

Why does the PR now still add over 16000 lines of code? Something is not right.

Parthiba-Hazra commented 2 months ago

Why does the PR now still add over 16000 lines of code? Something is not right.

yes I am checking that.

codecov-commenter commented 2 months ago

Codecov Report

Attention: Patch coverage is 8.62069% with 106 lines in your changes are missing coverage. Please review.

Project coverage is 72.81%. Comparing base (6c3a263) to head (8496613).

Files Patch % Lines
internal/image_from_checkpoint_archive.go 0.00% 91 Missing :warning:
cmd/build.go 34.78% 15 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #125 +/- ## ========================================== - Coverage 78.73% 72.81% -5.92% ========================================== Files 11 13 +2 Lines 1260 1376 +116 ========================================== + Hits 992 1002 +10 - Misses 201 307 +106 Partials 67 67 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

adrianreber commented 2 months ago

Good first try. I must confess I am still unsure which direction we should go, but let's try and see where this leads us.

Please add your script to the shellcheck target and please also take a look at the tool shfmt and add it during GitHub Actions CI runs. I recently introduce shfmt in another project. You can take a look at what I did there: https://github.com/openhpc/ohpc/blob/3.x/tests/ci/Makefile#L74

adrianreber commented 2 months ago

For the destination directory of the script I am currently thinking about /usr/libexec/.... That seems to be Fedora specific, but the git package for example uses it also extensively for internal scripts on my system:

$ rpm -ql git | grep libexec
/usr/libexec/git-core/git-contacts
/usr/libexec/git-core/git-credential-netrc
/usr/libexec/git-core/git-filter-branch
/usr/libexec/git-core/git-request-pull

We need to figure out what other distributions are using for internal scripts.

adrianreber commented 2 months ago

I would also say, instead of container engine specific annotations, which are almost not used at all (the only important one if name for CRI-O at least, but we can also change that), let's own the annotations in this package. We should also move the annotation definition to lib, so that all projects reading the annotations can easily import them from checkpointctl. Maybe start almost all annotation with checkpointctl.annotation.

@rst0git do you remember which annotation are actually used by Podman? CRI-O is just looking for io.kubernetes.cri-o.annotations.checkpoint.name.

rst0git commented 2 months ago

We currently use the runtime name to identify an OCI image that contains checkpoint. This functionality needs be extended (in Podman and CRI-O) to verify the comparability of the system before running criu restore. For example, we need to make sure that CRIU has the same version or newer, the CPU architecture is the same, the kernel version is compatible, etc.

adrianreber commented 2 months ago

We currently use the runtime name to identify an OCI image that contains checkpoint. This functionality needs be extended (in Podman and CRI-O) to verify the comparability of the system before running criu restore. For example, we need to make sure that CRIU has the same version or newer, the CPU architecture is the same, the kernel version is compatible, etc.

Right. But are any of the annotations already verified today or is that just an idea for the future?

rst0git commented 2 months ago

Right. But are any of the annotations already verified today or is that just an idea for the future?

In Podman we currently use only IsCheckpointImage to check annotations.

checkpointRuntimeName, found := imgData[i].Annotations[define.CheckpointAnnotationRuntimeName]
if !found {
    return false, fmt.Errorf("image is not a checkpoint: %s", imgID)
}
if hostInfo.Host.OCIRuntime.Name != checkpointRuntimeName {
    return false, fmt.Errorf("container image \"%s\" requires runtime: \"%s\"", imgID, checkpointRuntimeName)
}
rst0git commented 2 months ago

I would also say, instead of container engine specific annotations, which are almost not used at all (the only important one if name for CRI-O at least, but we can also change that), let's own the annotations in this package. We should also move the annotation definition to lib, so that all projects reading the annotations can easily import them from checkpointctl. Maybe start almost all annotation with checkpointctl.annotation.

This is a good idea. @Parthiba-Hazra would you like to open a separate pull request that adds the annotations from Podman and CRI-O? Once this PR is merged, we can update Podman and CRI-O to use the annotations from checkpointctl.

Parthiba-Hazra commented 2 months ago

For the destination directory of the script I am currently thinking about /usr/libexec/.... That seems to be Fedora specific, but the git package for example uses it also extensively for internal scripts on my system:

$ rpm -ql git | grep libexec
/usr/libexec/git-core/git-contacts
/usr/libexec/git-core/git-credential-netrc
/usr/libexec/git-core/git-filter-branch
/usr/libexec/git-core/git-request-pull

We need to figure out what other distributions are using for internal scripts.

I thought /usr/libexec is debian specific cause somewhere I see that it can be present under usr/local/libexec, maybe both possible . well let me check this.

rst0git commented 2 months ago

@Parthiba-Hazra /usr/libexec is defined in the Filesystem Hierarchy Standard (FHS) as follows:

/usr/libexec includes internal binaries that are not intended to be executed directly by users or shell scripts. Applications may use a single subdirectory under /usr/libexec.

https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s07.html

/usr/local/ is used for local data that is specific to the host.

Parthiba-Hazra commented 2 months ago

I would also say, instead of container engine specific annotations, which are almost not used at all (the only important one if name for CRI-O at least, but we can also change that), let's own the annotations in this package. We should also move the annotation definition to lib, so that all projects reading the annotations can easily import them from checkpointctl. Maybe start almost all annotation with checkpointctl.annotation.

This is a good idea. @Parthiba-Hazra would you like to open a separate pull request that adds the annotations from Podman and CRI-O? Once this PR is merged, we can update Podman and CRI-O to use the annotations from checkpointctl.

@rst0git @adrianreber do we need to keep those annotation's format unchanged or change it to similar what we are doing here(checkpointctl.annotation)?

adrianreber commented 2 months ago

I would also say, instead of container engine specific annotations, which are almost not used at all (the only important one if name for CRI-O at least, but we can also change that), let's own the annotations in this package. We should also move the annotation definition to lib, so that all projects reading the annotations can easily import them from checkpointctl. Maybe start almost all annotation with checkpointctl.annotation.

This is a good idea. @Parthiba-Hazra would you like to open a separate pull request that adds the annotations from Podman and CRI-O? Once this PR is merged, we can update Podman and CRI-O to use the annotations from checkpointctl.

@rst0git @adrianreber do we need to keep those annotation's format unchanged or change it to similar what we are doing here(checkpointctl.annotation)?

We talked a bit about it and we are not sure. Naming things is hard :wink: .

As this is all about CRIU I think we could go with something like org.criu.checkpointctl.name. I would not include annotation in the annotation.

Parthiba-Hazra commented 2 months ago

I would also say, instead of container engine specific annotations, which are almost not used at all (the only important one if name for CRI-O at least, but we can also change that), let's own the annotations in this package. We should also move the annotation definition to lib, so that all projects reading the annotations can easily import them from checkpointctl. Maybe start almost all annotation with checkpointctl.annotation.

This is a good idea. @Parthiba-Hazra would you like to open a separate pull request that adds the annotations from Podman and CRI-O? Once this PR is merged, we can update Podman and CRI-O to use the annotations from checkpointctl.

@rst0git @adrianreber do we need to keep those annotation's format unchanged or change it to similar what we are doing here(checkpointctl.annotation)?

We talked a bit about it and we are not sure. Naming things is hard 😉 .

As this is all about CRIU I think we could go with something like org.criu.checkpointctl.name. I would not include annotation in the annotation.

yes, this looks way better and reasonable - org.criu.checkpointctl.name.

Parthiba-Hazra commented 2 months ago

@adrianreber @rst0git Any update on the annotation's name? Is this the final format - org.criu.checkpointctl.checkpoint.name?

Also, shouldn't this annotation's name be org.criu.checkpointctl.checkpoint.container instead of checkpoint.name?

        // CheckpointAnnotationName is used when creating an OCI image
    // from a checkpoint archive to specify the name of the checkpoint.
    CheckpointAnnotationName = "checkpointctl.annotation.checkpoint.name"
rst0git commented 2 months ago

Any update on the annotation's name? Is this the final format - org.criu.checkpointctl.checkpoint.name?

@Parthiba-Hazra If you open another pull request with the updated annotations, we can discuss the format there.