Feature Proposal: Ignore Sensitive Information in Checkpointing Image

wenhuizhang commented 1 year ago

When we do checkpoint, all files in rootfs are checkpointed including the password and credentials. We need a systematic way of ignoring the Sensitive Information in Checkpointing Image. By adding a "--ignore-sensitive", the sensitive information related files and memory should be ignored.

adrianreber commented 1 year ago

I would say that is not possible. How should that be done? CRIU does not know where in memory things are stored and how should CRIU know if something is sensitive and what should it do if it even knew it. For me that sounds not doable.

wenhuizhang commented 1 year ago

I would say that is not possible. How should that be done? CRIU does not know where in memory things are stored and how should CRIU know if something is sensitive and what should it do if it even knew it. For me that sounds not doable.

In the restore procedure, two parts may have sensitive information, the filesystem diff and the memory.

For the filesystem part, below is a list of files we should ignore, which might lead to information leakage, these files should be restricted to accessible by sys admin only:

Number	Software Name	Description	Location	Comments
1	linux	plainntext password	/etc/passwd
2	linux	encrypted password	/etc/shadow
3	linux	user access control	/etc/security/*	containers and groups' access control
4	linux	system log	/etc/rsyslog.d/*
5	linux	kmesg	/proc/kmsg
6	linux	audit logs	/var/log/audit/audit.log
7	linux	authenticatoin logs	/var/log/auth.log
8	linux	DNF log	/var/log/dnf.log	software package manager log
9	linux	system log	/var/log/syslog
10	linux	secure module logs	/var/log/secure
11	linux	kernel log	/var/log/kern.log
12	linux	logs for logins and logouts	/var/log/wtmp
13	linux	kernel version	/proc/version
14	linux	kernel boot entry point info	/proc/cmdline

for the memory part, if we only allows admin accessing the above files, the information won't appear in a specific container's memory space.

Please let me know if my understanding needs to be adjusted.

Let us trim down to a smaller list of files, for example, only passed and shadow, and see if we could ignore them in the checkpoint image.

adrianreber commented 1 year ago

I do not think your approach can work. Your approach is not really CRIU related as CRIU does not handle file content during checkpointing. If files are changed, however, Podman and CRI-O include the changed files in the checkpoint archive.

But, usually, /etc/passwd and /etc/shadow are not changed in the container. These files are part of the base image the container is created and distributed easily by downloading an image. So why should these files be ignored. They are already part of the image.

The bigger problem is sensitive data that is loaded by the container processes at some random place in the memory. CRIU cannot know where this is and therefore it cannot it ignore it.

You wrote:

if we only allows admin accessing the above files, the information won't appear in a specific container's memory space

Not sure what you mean. The files are not important the content of some of the memory pages may contain sensitive information and there is no way to detect it.

rst0git commented 1 year ago

I agree with Adrian. It would be very difficult to handle all possible cases of sensitive data in CRIU and simply ignoring such data is likely to break the restore process.

We have a similar project idea, Anonymise Image Files, that was proposed in Google Summer of Code^1. The goal of this project is to remove sensitive data from a checkpoint so that it can be submitted as part of a bug report.

I hope this helps.

wenhuizhang commented 1 year ago

I agree with Adrian. It would be very difficult to handle all possible cases of sensitive data in CRIU and simply ignoring such data is likely to break the restore process.

We have a similar project idea, Anonymise Image Files, that was proposed in Google Summer of Code^1. The goal of this project is to remove sensitive data from a checkpoint so that it can be submitted as part of a bug report.

I hope this helps.

Thanks so much for the pointer to issue #360 , I am looking into see how we could leverage CRIT to rewrite img , and make it enhanced with anonymization

wenhuizhang commented 1 year ago

I agree with Adrian. It would be very difficult to handle all possible cases of sensitive data in CRIU and simply ignoring such data is likely to break the restore process.

We have a similar project idea, Anonymise Image Files, that was proposed in Google Summer of Code1 2. The goal of this project is to remove sensitive data from a checkpoint so that it can be submitted as part of a bug report.

I hope this helps.

Footnotes

https://criu.org/Google_Summer_of_Code_Ideas#Anonymise_image_files ↩

https://criu.org/Anonymize_image_files ↩

Tried CRIT to read all .img files in the checkpoint, it seems like the password etc information are hidden inside of the binary of mem.img, and using CRIT it self could not resolve it, some reverse execution and rewrite tools are needed to rewrite sensitive information in mem.img .

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

checkpoint-restore / criu

Feature Proposal: Ignore Sensitive Information in Checkpointing Image #2078

Footnotes