checkpoint-restore / checkpointctl

A tool for in-depth analysis of container checkpoints
Apache License 2.0
95 stars 15 forks source link

feat: add flag to view process tree #56

Closed snprajwal closed 1 year ago

snprajwal commented 1 year ago

CRIT already allows us to explore the process tree, file descriptors, memory map, and RSS map for a checkpoint. These features can be integrated with checkpointctl to allow more info to be displayed with checkpointctl show. This PR introduces checkpointctl show --ps-tree for viewing the process tree.

Here's a sample output of the tree displayed for a single process:

Process tree

Container
└── [1]  piggie
github-actions[bot] commented 1 year ago

Test Results

24 tests  +1   24 :heavy_check_mark: +1   0s :stopwatch: ±0s   1 suites ±0     0 :zzz: ±0    1 files   ±0     0 :x: ±0 

Results for commit 4a28ac18. ± Comparison against base commit 34bab996.

:recycle: This comment has been updated with latest results.

codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 71.79% and project coverage change: -0.72 :warning:

Comparison is base (34bab99) 80.56% compared to head (296b668) 79.85%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #56 +/- ## ========================================== - Coverage 80.56% 79.85% -0.72% ========================================== Files 3 3 Lines 391 407 +16 ========================================== + Hits 315 325 +10 - Misses 55 59 +4 - Partials 21 23 +2 ``` | [Impacted Files](https://app.codecov.io/gh/checkpoint-restore/checkpointctl/pull/56?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | Coverage Δ | | |---|---|---| | [container.go](https://app.codecov.io/gh/checkpoint-restore/checkpointctl/pull/56?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-Y29udGFpbmVyLmdv) | `80.95% <66.66%> (-1.84%)` | :arrow_down: | | [checkpointctl.go](https://app.codecov.io/gh/checkpoint-restore/checkpointctl/pull/56?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-Y2hlY2twb2ludGN0bC5nbw==) | `85.43% <88.88%> (+1.22%)` | :arrow_up: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

snprajwal commented 1 year ago

CRIU dump and restore are failing in the CI, I'm not sure why. Is it because of insufficient permissions or something? I based the workflow directly off of the one we have in go-criu.

adrianreber commented 1 year ago

CRIU dump and restore are failing in the CI, I'm not sure why. Is it because of insufficient permissions or something? I based the workflow directly off of the one we have in go-criu.

We run in a unprivileged container. GitHub actions supports privileged containers. Please try that.

snprajwal commented 1 year ago

We run in a unprivileged container. GitHub actions supports privileged containers. Please try that.

Looks like that helped - now both workflows fail on restore :laughing:

adrianreber commented 1 year ago

We run in a unprivileged container. GitHub actions supports privileged containers. Please try that.

Looks like that helped - now both workflows fail on restore :laughing:

You either need to print the logs in an error case or log to stdout.

It could be necessary to mount /lib/modules from the host into the container so that necessary module can be loaded.

But without logs it is difficult to tell.

snprajwal commented 1 year ago

I don't want to create too much noise on this PR. Let me try it out on a separate branch, and push the changes here once I get it working

adrianreber commented 1 year ago

I think we might need to wait a bit with this PR. My favourite approach right now would be to make everything PID tree based. Maybe the table view is only useful for the quick overview. I am curious in how far the tree based approach would work for checkpointctl overall.

snprajwal commented 1 year ago

I think we might need to wait a bit with this PR. My favourite approach right now would be to make everything PID tree based. Maybe the table view is only useful for the quick overview. I am curious in how far the tree based approach would work for checkpointctl overall.

Personally, I think the issue with the tree view is that it will restrict the amount of information we can provide. If we put too much info in the tree, it will become unreadable, and hard to track the parent/children. I was thinking we'll use the tree for the barebones process tree (PID + process name), and use tables for more information. If we display both, then the user can have a quick glance at the tree to understand the process structure and explore the individual process info through the tables.

adrianreber commented 1 year ago

We talked about binary sizes before. I think we should include a CI check about the binary size like Podman has. Build each PR without the PR and record the binary size, build the PR with the changes and record the binary size. Compare both sizes and fail if +10%.

snprajwal commented 1 year ago

I've added the process tree logic here. I didn't want to create a separate branch and PR for it as of now since I had important changes in the first two commits, and did not want to duplicate them. If this looks good, we can discuss how file descriptors, memory maps, etc. integrate into this and add them accordingly. I still have the commit and code for the file descriptors feature saved on a different branch.

adrianreber commented 1 year ago

There are a couple of things that should be done differently. This PR mixes too many things. There is some renaming going on which should be a separate commit. The changes to the testing and the new features should be different PRs.

For the new feature I would like to see some example output in the PR description.

Instead of the loop.c example, let's try to use piggie.c from go-criu, which also creates a couple of namespaces as this will be helpful at some point during testing.

Also the binary size comparison PR should be done before any more includes from go-criu.

snprajwal commented 1 year ago

Got it, I'll split it up and open separate PRs.

snprajwal commented 1 year ago

@adrianreber I've updated the PR description with the tree we currently display during tests (with piggie).

adrianreber commented 1 year ago

Looks good for a first step. I like it.