checkpoint-restore / checkpointctl

A tool for in-depth analysis of container checkpoints
Apache License 2.0
95 stars 15 forks source link

Do not unpack the whole checkpoint archive #51

Closed adrianreber closed 1 year ago

adrianreber commented 1 year ago

Currently checkpointctl is not really clever when it comes to showing information about a checkpoint archive. Checkpoint archives are always unpacked unconditionally and completely.

It would be enough, depending on the options used, to only unpack one or two files. This would require less space in /tmp and probably be faster.

The size of the checkpoint can also be figured out from the tar headers and does not require actual unpacking of the files.

rst0git commented 1 year ago

It is worth noting that /tmp has limited size and checkpointctl show currently fails with large checkpoints because it tries to extract all files before reading any information. Extracting only the necessary files would fix this and significantly improve the performance of checkpointctl.

For example, the following command can be used to extract only the stats-dump file from a container checkpoint created with CRI-O:

tar --extract --file=./checkpoint.tar stats-dump
behouba commented 1 year ago

@adrianreber , @rst0git as I am working on proposing something for the issue #53. It appear that this issue need to be solved first (e.g to avoid unpacking all checkpoint files to get the size of the checkpoint).
What do you think is preferred approach to solve this issue ? I have considered two options:

I am also wondering if there is any alternative approach using the archive package that I might not be aware of.

adrianreber commented 1 year ago

It appear that this issue need to be solved first (e.g to avoid unpacking all checkpoint files to get the size of the checkpoint).

Yes, I agree.

Use tar CLI inside the code like in the example given before by @rst0git

This is not really an option from my point of view and should be avoided. I also think it is not necessary.

There are two problems you need to solve. First is to only extract certain files. This can easily be done they way we did it in CRI-O: https://github.com/cri-o/cri-o/blob/main/server/container_restore.go#L107

You can use an exclude list.

The other part, getting the size of the rootfs and actual checkpoint data is a bit more difficult. From my point of view you need access to the low-level tar functionality: https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/checkpoint_container.go#L171

That is also easy, but the tar archive could be compressed. So you first need to do the appropriate decompression and then access the tar header to get the size by just reading the size information for each file from the tar header.

Without looking too deep into it I think it might be possible to use the function DecompressStream() from https://github.com/containers/storage/blob/main/pkg/archive/archive.go#L182. Once you have decompressed stream you should be able to loop over all files in the tar archive to extract the size information.

behouba commented 1 year ago

Thank you so much for the detailed response.