checkpoint-restore / checkpointctl

A tool for in-depth analysis of container checkpoints
Apache License 2.0
87 stars 15 forks source link

Proof of concept on the usage of `crit` features for reading memory pages #69

Closed behouba closed 11 months ago

behouba commented 1 year ago

This proof of concept introduces the following new flags:

Example output

Information about the new flags:

$ ./checkpointctl show --help
Show information about available checkpoints

Usage:
  checkpointctl show [flags]

Flags:
      --all               Display all additional information about the checkpoints
      --full-paths        Display mounts with full paths
  -h, --help              help for show
      --mounts            Print overview about mounts used in the checkpoints
      --pid uint32        The PID of the process to display. This option should be used with [--ps-args | --ps-env-vars | --ps-memory-pages]
      --ps-args           Display the arguments for the specified pid
      --ps-env-vars       Display environment variables for the specified pid
      --ps-memory-pages   Display memory pages for the specified pid in hexdump format
      --stats             Print checkpointing statistics if available

Example of ps-args:

$ ./checkpointctl show --pid=1 --ps-args /path/to/checkpoint.tar.gz 

Displaying container checkpoint data from /path/to/checkpoint.tar.gz 

+---------------+---------------------------------+--------------+---------+---------------------------+--------+------------+-------------------+
|   CONTAINER   |              IMAGE              |      ID      | RUNTIME |          CREATED          | ENGINE | CHKPT SIZE | ROOT FS DIFF SIZE |
+---------------+---------------------------------+--------------+---------+---------------------------+--------+------------+-------------------+
| awesome_booth | docker.io/library/ubuntu:latest | 695b77deb382 | crun    | 2023-03-08T08:45:33+03:00 | Podman | 2.8 MiB    | 309.0 KiB         |
+---------------+---------------------------------+--------------+---------+---------------------------+--------+------------+-------------------+

Arguments of process 1 

/bin/bash 

Example of ps-env-vars:

$ ./checkpointctl show --pid=1 --ps-env-vars /path/to/checkpoint.tar.gz  

Displaying container checkpoint data from /path/to/checkpoint.tar.gz 

+---------------+---------------------------------+--------------+---------+---------------------------+--------+------------+-------------------+
|   CONTAINER   |              IMAGE              |      ID      | RUNTIME |          CREATED          | ENGINE | CHKPT SIZE | ROOT FS DIFF SIZE |
+---------------+---------------------------------+--------------+---------+---------------------------+--------+------------+-------------------+
| awesome_booth | docker.io/library/ubuntu:latest | 695b77deb382 | crun    | 2023-03-08T08:45:33+03:00 | Podman | 2.8 MiB    | 309.0 KiB         |
+---------------+---------------------------------+--------------+---------+---------------------------+--------+------------+-------------------+

Environment variables of process 1 

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
TERM=xterm
container=podman
HOME=/root
HOSTNAME=695b77deb382

Example of ps-memory-pages:

$ ./checkpointctl show --pid=1 --ps-memory-pages /path/to/checkpoint.tar.gz | head -n 25

Displaying container checkpoint data from /path/to/checkpoint.tar.gz

+---------------+---------------------------------+--------------+---------+---------------------------+--------+------------+-------------------+
|   CONTAINER   |              IMAGE              |      ID      | RUNTIME |          CREATED          | ENGINE | CHKPT SIZE | ROOT FS DIFF SIZE |
+---------------+---------------------------------+--------------+---------+---------------------------+--------+------------+-------------------+
| awesome_booth | docker.io/library/ubuntu:latest | 695b77deb382 | crun    | 2023-03-08T08:45:33+03:00 | Podman | 2.8 MiB    | 309.0 KiB         |
+---------------+---------------------------------+--------------+---------+---------------------------+--------+------------+-------------------+

Memory pages info (vaddr, hexadecimal, ascii) of process 1 

5614ced87000  f3 0f 1e fa 48 83 ec 08  48 8b 05 e1 de 11 00 48   |....H...H......H|
5614ced87010  85 c0 74 02 ff d0 48 83  c4 08 c3 00 00 00 00 00   |..t...H.........|
5614ced87020  ff 35 c2 d6 11 00 f2 ff  25 c3 d6 11 00 0f 1f 00   |.5......%.......|
5614ced87030  f3 0f 1e fa 68 00 00 00  00 f2 e9 e1 ff ff ff 90   |....h...........|
*
5614ced870a0  f3 0f 1e fa 68 07 00 00  00 f2 e9 71 ff ff ff 90   |....h......q....|
5614ced870b0  f3 0f 1e fa 68 08 00 00  00 f2 e9 61 ff ff ff 90   |....h......a....|
5614ced870c0  f3 0f 1e fa 68 09 00 00  00 f2 e9 51 ff ff ff 90   |....h......Q....|
5614ced870d0  f3 0f 1e fa 68 0a 00 00  00 f2 e9 41 ff ff ff 90   |....h......A....|
5614ced870e0  f3 0f 1e fa 68 0b 00 00  00 f2 e9 31 ff ff ff 90   |....h......1....|
5614ced870f0  f3 0f 1e fa 68 0c 00 00  00 f2 e9 21 ff ff ff 90   |....h......!....|
5614ced87100  f3 0f 1e fa 68 0d 00 00  00 f2 e9 11 ff ff ff 90   |....h...........|
*
5614ced871a0  f3 0f 1e fa 68 17 00 00  00 f2 e9 71 fe ff ff 90   |....h......q....|

I am still uncertain about the design of the new flags. πŸ€”

Please note that the CI is expected to fail because the version of the crit package used in this code is copied from https://github.com/checkpoint-restore/go-criu/pull/133.

codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 76.00% and project coverage change: -0.49 :warning:

Comparison is base (c3bcdc6) 80.00% compared to head (9efa578) 79.51%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #69 +/- ## ========================================== - Coverage 80.00% 79.51% -0.49% ========================================== Files 3 3 Lines 435 454 +19 ========================================== + Hits 348 361 +13 - Misses 64 70 +6 Partials 23 23 ``` | [Impacted Files](https://app.codecov.io/gh/checkpoint-restore/checkpointctl/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | Coverage Ξ” | | |---|---|---| | [container.go](https://app.codecov.io/gh/checkpoint-restore/checkpointctl/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-Y29udGFpbmVyLmdv) | `79.22% <71.42%> (-1.00%)` | :arrow_down: | | [checkpointctl.go](https://app.codecov.io/gh/checkpoint-restore/checkpointctl/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-Y2hlY2twb2ludGN0bC5nbw==) | `87.28% <100.00%> (+0.33%)` | :arrow_up: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

github-actions[bot] commented 1 year ago

Test Results

26 tests  Β±0   26 :heavy_check_mark: Β±0   0s :stopwatch: Β±0s βŸβ€„1 suites Β±0β€‚β€ƒβ€ƒβŸβ€„0 :zzz: Β±0  βŸβ€„1 files   Β±0β€‚β€ƒβ€ƒβŸβ€„0 :x: Β±0 

Results for commit 5f24f390. ± Comparison against base commit ba8d41b1.

:recycle: This comment has been updated with latest results.

adrianreber commented 1 year ago

Now that #56 is merged maybe you could extend the process view with the full command-line. Instead of:

counter
└── [1]  bash
    β”œβ”€β”€ [7]  bash
    β”œβ”€β”€ [7]  counter.py
    β”œβ”€β”€ [8]  bash
    └── [8]  tee

This uses the information from the core image filed comm. In you example I see the output that can also be seen when running ps aux. You copy the complete command-line from the pages images. Maybe that would be nice first use case of your go-criu interface.

You can just include go-criu with the SHA and date, not using a released version, but the current latest version from git.

behouba commented 1 year ago

With the introduction of the inspect sub-command, I believe that some of the features for reading memory pages content should also be part of inspect parameters. I would like to discuss how we should incorporate these features.

Examples:

$ checkpointctl inspect /path/to/checkpoint.tar.gz --ps-tree
counter
└── [1]  bash
    β”œβ”€β”€ [7]  bash
    β”œβ”€β”€ [7]  counter.py
    β”œβ”€β”€ [8]  bash
    └── [8]  tee
$ checkpointctl inspect /path/to/checkpoint.tar.gz --ps-tree --ps-args
counter
└── [1]  bash
    β”œβ”€β”€ [7]  bash -c 'python counter.py'
    β”œβ”€β”€ [7]  python counter.py --input data.txt --output result.txt
    β”œβ”€β”€ [8]  bash -c 'tee output.log'
    └── [8]  tee output.log

Example:

$ checkpointctl inspect /path/to/checkpoint.tar.gz --ps-tree --ps-env-vars
counter
└── [1]  bash
    β”œβ”€β”€ [7]  bash
    β”‚   β”œβ”€β”€ [7]  counter.py
    β”‚   β”‚   β”œβ”€β”€ [7]  PYTHONPATH=/usr/local/lib/python3.9/site-packages
    β”‚   β”‚   └── [7]  DEBUG_MODE=true
    β”‚   β”œβ”€β”€ [8]  bash
    β”‚   └── [8]  tee
    └── [8]  tee
        └── [8]  OUTPUT_DIR=/var/log

What do you think @rst0git , @adrianreber ?

rst0git commented 1 year ago

I am wondering if we can keep a default output and add another flag to display the full command line from memory pages?

It makes sense. For example, we can introduce something like the following two options:

for inspecting the content of processes memory pages. I am not sure if it will make sense to have it alongside the tree or json view.

The content of memory pages could be several gigabytes and it might not be appropriate to show all of it in a terminal. However, we can introduce a sub-command, for example checkpointctl memparser, that can save the output in a file. We can also extend this sub-command with additional options for more effective memory analysis.

@adrianreber What do you think?

adrianreber commented 12 months ago

What do you think @rst0git , @adrianreber ?

Looks good.

rst0git commented 11 months ago

@behouba A container checkpoint may include multiple processes, each with potentially large amount memory. To analyze the memory of a checkpoint it might be useful to display an overview of the memory size of each process, as well as to show the memory pages for a specific process.

Would it make sense to implement this as a table showing all PIDs, process names, and memory size for each process? For example, when the --pid option has been specified it could show the memory pages (vaddr, hexadecimal, ascii) for the specified PID, and use --output option that could allow to write the output to a specified file instead of STDOUT.

We could implement this as new sub-command for checkpointctl with description "Analyze container checkpoint memory", or extend the existing (show and inspect) sub-commands.

@behouba @adrianreber What do you think?

behouba commented 11 months ago

Would it make sense to implement this as a table showing all PIDs, process names, and memory size for each process? For example, when the --pid option has been specified it could show the memory pages (vaddr, hexadecimal, ascii) for the specified PID, and use --output option that could allow to write the output to a specified file instead of STDOUT.

Implementing this as a table with PIDs, process names, and memory sizes for each process seems to me like a good approach. Also, the --pid an --output options would add flexibility and usability to view to content of the memory pages.

We could implement this as new sub-command for checkpointctl with description "Analyze container checkpoint memory", or extend the existing (show and inspect) sub-commands.

In my option, adding a new sub-command for this use case is more appropriate. However, I am unsure about the most suitable name for this sub-command. What about memscan or memparser as you suggested earlier @rst0git ?

rst0git commented 11 months ago

What about memscan or memparser as you suggested earlier @rst0git ?

checkpointctl memparse might be more appropriate.

"parse" is defined as "to examine computer data and change it into a form that can be easily read or understood".

The following wiki page has some relevant examples of the Volatility interface for memory analysis: https://github.com/volatilityfoundation/volatility/wiki/Linux-Command-Reference#process-memory

behouba commented 11 months ago

"parse" is defined as "to examine computer data and change it into a form that can be easily read or understood".

I totally agree, memparse is more appropriate according to the definition. Thank you @rst0git, I will work on that.

rst0git commented 11 months ago

Closing in favour of https://github.com/checkpoint-restore/checkpointctl/pull/95