checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
3k stars 599 forks source link

Feature Proposal: Ignore Sensitive Information in Checkpointing Image #2078

Open wenhuizhang opened 1 year ago

wenhuizhang commented 1 year ago

When we do checkpoint, all files in rootfs are checkpointed including the password and credentials. We need a systematic way of ignoring the Sensitive Information in Checkpointing Image. By adding a "--ignore-sensitive", the sensitive information related files and memory should be ignored.

adrianreber commented 1 year ago

I would say that is not possible. How should that be done? CRIU does not know where in memory things are stored and how should CRIU know if something is sensitive and what should it do if it even knew it. For me that sounds not doable.

wenhuizhang commented 1 year ago

I would say that is not possible. How should that be done? CRIU does not know where in memory things are stored and how should CRIU know if something is sensitive and what should it do if it even knew it. For me that sounds not doable.

In the restore procedure, two parts may have sensitive information, the filesystem diff and the memory.

  1. For the filesystem part, below is a list of files we should ignore, which might lead to information leakage, these files should be restricted to accessible by sys admin only:
Number Software Name Description Location Comments
1 linux plainntext password /etc/passwd  
2 linux encrypted password /etc/shadow  
3 linux user access control /etc/security/* containers and groups' access control
4 linux system log /etc/rsyslog.d/*  
5 linux kmesg /proc/kmsg  
6 linux audit logs /var/log/audit/audit.log  
7 linux authenticatoin logs /var/log/auth.log  
8 linux DNF log /var/log/dnf.log software package manager log
9 linux system log /var/log/syslog  
10 linux secure module logs /var/log/secure  
11 linux kernel log /var/log/kern.log  
12 linux logs for logins and logouts /var/log/wtmp  
13 linux kernel version /proc/version  
14 linux kernel boot entry point info /proc/cmdline  
  1. for the memory part, if we only allows admin accessing the above files, the information won't appear in a specific container's memory space.

Please let me know if my understanding needs to be adjusted.

Let us trim down to a smaller list of files, for example, only passed and shadow, and see if we could ignore them in the checkpoint image.

1 | linux | plainntext password | /etc/passwd |   2 | linux | encrypted password | /etc/shadow |  

adrianreber commented 1 year ago

I do not think your approach can work. Your approach is not really CRIU related as CRIU does not handle file content during checkpointing. If files are changed, however, Podman and CRI-O include the changed files in the checkpoint archive.

But, usually, /etc/passwd and /etc/shadow are not changed in the container. These files are part of the base image the container is created and distributed easily by downloading an image. So why should these files be ignored. They are already part of the image.

The bigger problem is sensitive data that is loaded by the container processes at some random place in the memory. CRIU cannot know where this is and therefore it cannot it ignore it.

You wrote:

if we only allows admin accessing the above files, the information won't appear in a specific container's memory space

Not sure what you mean. The files are not important the content of some of the memory pages may contain sensitive information and there is no way to detect it.

rst0git commented 1 year ago

I agree with Adrian. It would be very difficult to handle all possible cases of sensitive data in CRIU and simply ignoring such data is likely to break the restore process.

We have a similar project idea, Anonymise Image Files, that was proposed in Google Summer of Code^1. The goal of this project is to remove sensitive data from a checkpoint so that it can be submitted as part of a bug report.

I hope this helps.

wenhuizhang commented 1 year ago

I agree with Adrian. It would be very difficult to handle all possible cases of sensitive data in CRIU and simply ignoring such data is likely to break the restore process.

We have a similar project idea, Anonymise Image Files, that was proposed in Google Summer of Code^1. The goal of this project is to remove sensitive data from a checkpoint so that it can be submitted as part of a bug report.

I hope this helps.

Thanks so much for the pointer to issue #360 , I am looking into see how we could leverage CRIT to rewrite img , and make it enhanced with anonymization

wenhuizhang commented 1 year ago

I agree with Adrian. It would be very difficult to handle all possible cases of sensitive data in CRIU and simply ignoring such data is likely to break the restore process.

We have a similar project idea, Anonymise Image Files, that was proposed in Google Summer of Code12. The goal of this project is to remove sensitive data from a checkpoint so that it can be submitted as part of a bug report.

I hope this helps.

Footnotes

  1. https://criu.org/Google_Summer_of_Code_Ideas#Anonymise_image_files
  2. https://criu.org/Anonymize_image_files

Tried CRIT to read all .img files in the checkpoint, it seems like the password etc information are hidden inside of the binary of mem.img, and using CRIT it self could not resolve it, some reverse execution and rewrite tools are needed to rewrite sensitive information in mem.img .

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.