checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.93k stars 585 forks source link

/proc/<pid>/map_files with deleted file makes CRIU dumping failed #2016

Open zcharlesz opened 1 year ago

zcharlesz commented 1 year ago

Description

If there is Named Semaphore in source code, CRIU could not dump/restore it properly before Semaphore is destroyed; Since sem_open(..) will create a tmp file in /dev/shm and map it to memory (which could be found in /proc/\<pid>/map_files), after the Named Semaphore was cerated, the tmp file will be unlinked (deleted). And CRIU dump failed.

I attempted using --link-remap while dumping/restoring, it succeed with dumping but failed restoring;

Steps to reproduce the issue:

  1. The code below could reproduce the problem. demo code
    
    /*semopen.c*/
    #include <stdio.h>
    #include <fcntl.h>
    #include <semaphore.h>
    #include <ctype.h>
    #include <sys/stat.h>
    #include <sys/types.h>

int main() { sem_t *sem; char name[20]; sprintf(name,"abc"); sem=sem_open(name,O_CREAT,0644,1);

int i=1; while(i==1) {i = 1;} //sem_close(sem); //sem_unlink(name); }

2. command

gcc -o semopen semopen.c -lpthread ./semopen criu dump -v4 -o dump.log --shell-job --display-stats --file-validation buildid -t 1081175 -D t14_link4 -R && echo ok

3. dump error result

(00.025311) Dumping path for -3 fd via self 12 [/dev/shm/DTuB4A (deleted)] (00.025322) Strip ' (deleted)' tag from './dev/shm/DTuB4A (deleted)' (00.025328) Error (criu/files-reg.c:1045): Can't create link remap for /dev/shm/DTuB4A. Use link-remap option. (00.025340) Error (criu/cr-dump.c:1530): Collect mappings (pid: 1081175) failed with -1 (00.025589) Unlock network (00.025598) Unfreezing tasks into 1 (00.025604) Unseizing 1081175 into 1 (00.025623) Error (criu/cr-dump.c:2059): Dumping FAILED.

4. check /proc/\<pid\>/map_files, it shows:

7fba9a3bf000-7fba9a3c0000 -> '/dev/shm/DTuB4A (deleted)'


5. adding --link-remap while dumping makes dump succeed; but restore still failed with following error:

(00.018802) 1081175: Warn (criu/files-reg.c:1807): Can't link dev/shm/link_remap.6 -> dev/shm/DTuB4A (00.018832) 1081175: Error (criu/files-reg.c:2184): Can't link dev/shm/link_remap.6 -> dev/shm/DTuB4A: No such file or directory (00.018850) 1081175: Error (criu/mem.c:1372): `- Can't open vma (00.019013) Error (criu/cr-restore.c:2528): Restoring FAILED. (00.020307) Error (criu/cr-restore.c:1502): 1081175 killed by signal 9: Killed


6. However, if the Open Named Semaphore file /dev/shm/sem.abc existed before executing the program. /proc/\<pid\>/map_files will map to the correct path; and CRIU works properly. (execute the code twice without deleting /dev/shm/sem.abc could reproduce the case)
check /proc/\<pid\>/map_files :
``` 7fc40cec5000-7fc40cec6000 -> /dev/shm/sem.abc ```

**CRIU logs and information:**

<!--
You can either attach logs as files to the issue or put them under details
-->

<details><summary>CRIU full dump/restore logs:</summary>
<p>

could be reproduced with demo code above;


</p>
</details>

<details><summary>Output of `criu --version`:</summary>
<p>

Version: 3.17


</p>
</details>

<details><summary>Output of `criu check --all`:</summary>
<p>

Warn (criu/cr-check.c:1334): Nftables based locking requires libnftables and set concatenations support Looks good but some kernel features are missing which, depending on your process tree, may cause dump or restore failure.



</p>
</details>

**Additional environment details:**
Ubuntu 20.04
avagin commented 1 year ago

I tried your reproducer a few times and it didn't reproducer the issue. Could you add an exact sequence of actions that I need to do to reproduce the issue.

zcharlesz commented 1 year ago

I tried your reproducer a few times and it didn't reproducer the issue. Could you add an exact sequence of actions that I need to do to reproduce the issue.

Steps to reproduce the issue:

  1. create semopen.c (with exact code in question description, don't uncomment sem_unlink and sem_close)
  2. compile it: gcc -o semopen semopen.c -lpthread (gcc version is 9.4.0, Ubuntu version is 20.04)
  3. make sure /dev/shm/sem.abc file didn't exist. If it exists, everything works fine. execute it: ./semopen in another terminal, get the pid of semopen and try criu dump (edit the pid and dir in cmd line): criu dump -v4 -o dump.log --shell-job --display-stats --file-validation buildid -t 1081175 -D t14_link4 -R && echo ok
  4. Dump should fail. And if you check /proc/pid/map_files/, there should be a link looks like address--->/dev/shm/xxxx(deleted);

If you still couldn't reproduce the issue, could you tell me what system, criu version, gcc version you are using? I will try it on my PC. Since the core problem is that sem_open() maps a tmp file and then unlink it if the named semaphore did't exist, CRIU detected the tmp file in /proc/pid/map_files and could't dump it since it is deleted. Will the environment change the behaviour of sem_open()? @avagin

avagin commented 1 year ago

fyi: CRIU doesn't save/restore file system states and it is a user responsibility to restore file systems to the state when a dump has been made.

zcharlesz commented 1 year ago

I have tried -link-remap option before. Dump succeed but it could not be restored (if I dump with -R option, restore failed; without -R, restore succeed). Since the file CRIU detected in map_files has been unlinked(deleted) before dump (so there is no way to restore that file.).

This is output of restore.log with -R while dumping.

(00.017307) 1200716: Error (criu/files-reg.c:2184): Can't link dev/shm/link_remap.6 -> dev/shm/TEUyiI: No such file or directory
(00.017325) 1200716: Error (criu/mem.c:1372): `- Can't open vma
(00.017462) Error (criu/cr-restore.c:2528): Restoring FAILED.

And I have already saved the file system using action scripts, while it is impossible to save a deleted file which is deleted before dumping; but CRIU do detected the file link in proc/pid/map_files. Even with --link-remap options, I could not restore the deleted file back.

Is there an option to ignore the deleted map_files link while dumping?

zcharlesz commented 1 year ago

fyi: CRIU doesn't save/restore file system states and it is a user responsibility to restore file systems to the state when a dump has been made.

ok, I may find the solution. If I use --link-remap without -R option, both dump and restore are ok; If I use --link-remap with -R together, I have to create a "fake file"(the deleted file in map_files and a fake link_remap.6 file), then it could be restored successfully. The "create-fake-file" step is done by CRIU without -R option, and it has to be done by user with -R option. Thanks for your reply.

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.