checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.98k stars 599 forks source link

criu c/r failed on qt gui demo & qt console demo on ubuntu #2526

Open oophelia opened 2 days ago

oophelia commented 2 days ago

Description

Steps to reproduce the issue:

  1. build a qt gui demo & qt console demo on ubuntu
  2. find the pidof
  3. sudo criu dump -t 157888 --tcp-established --shell-job -o dump.log -v4 -D cp9

it is a very easy qt demo qt gui demo: it has a text showing seconds on mainwindow, every second +1; qt console demo: it prints text showing seconds in qt console, every second +1;

Describe the results you received: qt gui demo: dump failed, reports that "External socket is used". When using --ext-unix-sk, still dump failed, reports that "Can't dump half of a stream unix connection"

qt console demo: in log reports dump success and restore success. Actually not restore successfully.

Describe the results you expected: both demo could c/r success

Additional information you deem important (e.g. issue happens only occasionally):

CRIU logs and information:

CRIU full dump/restore logs:

qt gui demo dump sudo criu dump -t 189360 --tcp-established --shell-job -o dump.log -v4 -D gui-demo/ --ext-unix-sk ``` (00.041265) Obtaining task auvx ... (00.041391) Dumping path for -3 fd via self 17 [/home/parallels/qt-project-test/build/Desktop_Qt_6_8_0-Release] (00.041408) Dumping path for -3 fd via self 17 [/] (00.041411) Dumping task cwd id 0xa4 root id 0xa5 (00.041453) Dumping file-locks (00.041457) (00.041458) Dumping pstree (pid: 189360) (00.041459) ---------------------------------------- (00.041459) Process: 189360(189360) (00.041467) ---------------------------------------- (00.041494) cg: Dumping 1 sets (00.041496) cg: `- Dumping of /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-5a403202-721e-4166-bf85-eba3f2f2e72b.scope (00.041497) cg: `- Dumping name=zdtmtst of / (00.041497) cg: `- Dumping name=zdtmtst.defaultroot of / (00.041503) cg: Writing CG image (00.041518) unix: Dumping external sockets (00.041527) unix: Dumping extern: ino 737870 peer_ino 737869 family 1 type 1 state 1 name /run/user/1000/bus (00.041530) unix: Dumped extern: id 0xa6 ino 737870 peer 0 type 2 state 10 name 19 bytes (00.041531) unix: Ext stream not supported: ino 737870 peer_ino 737869 family 1 type 1 state 1 name /run/user/1000/bus (00.041532) Error (criu/sk-unix.c:881): unix: Can't dump half of stream unix connection. name: (null); peer name: /run/user/1000/bus (00.041570) net: Unlock network (00.041573) Unfreezing tasks into 1 (00.041574) Unseizing 189360 into 1 (00.041584) Error (criu/cr-dump.c:2111): Dumping FAILED. ``` qt console demo dump sudo criu dump -t 189917 --tcp-established --shell-job -o dump.log -v4 -D con3/ ``` 00.018022) Dumping pstree (pid: 189917) (00.018023) ---------------------------------------- (00.018024) Process: 189917(189917) (00.018057) ---------------------------------------- (00.018092) cg: Dumping 1 sets (00.018095) cg: `- Dumping of /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-5a403202-721e-4166-bf85-eba3f2f2e72b.scope (00.018098) cg: `- Dumping name=zdtmtst of / (00.018099) cg: `- Dumping name=zdtmtst.defaultroot of / (00.018104) cg: Writing CG image (00.018139) unix: Dumping external sockets (00.018158) Writing image inventory (version 1) (00.018249) Running post-dump scripts (00.018261) Unfreezing tasks into 2 (00.018263) Unseizing 189917 into 2 (00.018623) Writing stats (00.018653) Dumping finished successfully ``` qt console demo restore sudo criu restore --tcp-established --shell-job -o restore.log -v4 -D con3/ ``` (00.020760) net: Unlock network (00.020779) pie: 189917: seccomp: mode 0 on tid 189917 (00.025203) 189917 was trapped (00.025215) 189917 was trapped (00.025217) 189917 (native) is going to execute the syscall 139, required is 139 (00.025224) 189917 was stopped (00.025236) 189917 was trapped (00.025237) 189917 (native) is going to execute the syscall 215, required is 215 (00.025251) 189917 was stopped (00.025256) Run late stage hook from criu master for external devices (00.025257) restore late stage hook for external plugin failed (00.025258) Running pre-resume scripts (00.025261) Restore finished successfully. Tasks resumed. (00.025262) Writing stats (00.025295) Running post-resume scripts ```

Output of `criu --version`:

``` Version: 4.0 GitID: v4.0-33-gdd6b580b4 ```

Output of `criu check --all`:

``` Warn (criu/cr-check.c:824): Dirty tracking is OFF. Memory snapshot will not work. Warn (criu/cr-check.c:1259): Do not have API to map vDSO - will use mremap() to restore vDSO Warn (criu/cr-check.c:1179): CRIU built without CONFIG_COMPAT - can't C/R compatible tasks Looks good but some kernel features are missing which, depending on your process tree, may cause dump or restore failure. ```

Additional environment details:

adrianreber commented 2 days ago

Not sure what your test programs are but generally CRIU cannot checkpoint GUI applications. Have you seen https://criu.org/VNC ? It describes a possible way to do it, but processes with a GUI are not really supported.

oophelia commented 2 days ago

Not sure what your test programs are but generally CRIU cannot checkpoint GUI applications. Have you seen https://criu.org/VNC ? It describes a possible way to do it, but processes with a GUI are not really supported.

thanks for replying, the qt gui demo doesn't work might due to this reason, but qt console demo should not using gui, it shows restore success, but i can't find it in the process.

#include <QCoreApplication>
#include <QTimer>
#include <iostream>
#include <signal.h>

int main(int argc, char *argv[])
{
    QCoreApplication app(argc, argv);
    int counter = 0;

    // Signal handlers to catch unexpected exits
    signal(SIGTERM, [](int) { std::cout << "SIGTERM caught!" << std::endl; });
    signal(SIGSEGV, [](int) { std::cout << "Segmentation fault caught!" << std::endl; });

    // Log process ID and other information
    std::cout << "PID: " << getpid() << " started." << std::endl;

    QTimer timer;
    QObject::connect(&timer, &QTimer::timeout, [&]() {
        counter++;
        std::cout << "Counter: " << counter << std::endl;
    });
    timer.start(1000);

    return app.exec();
}
adrianreber commented 2 days ago

Thanks for the example. Using your code I am able to checkpoint a restore the application without any problems:

# criu restore -j 
Counter: 25
Counter: 26
Counter: 27

Why do you say that the restore does not work? How do you check if the process is running?

oophelia commented 2 days ago

Thanks for the example. Using your code I am able to checkpoint a restore the application without any problems:

# criu restore -j 
Counter: 25
Counter: 26
Counter: 27

Why do you say that the restore does not work? How do you check if the process is running?

there is no output on terminal after restore, then i check the recent pid using below command, can't find related process ps -eo pid,lstart,cmd --sort=-lstart | head -n 10 by the way, how do you compile this code? I compiled using qt creator with qt kits, as a whole qt console application project.

adrianreber commented 2 days ago

by the way, how do you compile this code?

I searched and found a documentation which said to do this:

# qmake6 -project
# qmake6
# make