Open JaChouSSS opened 2 years ago
@JaChouSSS Thanks for reporting the Github issue.
The segmentation fault should create a core dump file (with ulimit -c unlimited
). Could you please share the backtrace?
In addition, I failed to test all the examples in /test, but DMTCP worked.
You need to give more information on how you are launching and checkpointing a test case. So, we can help you fix the issue.
@JainTwinkle Thanks for your quick reply. The entire code is run in Docker and environment is as follows: Ubuntu 16.04 gcc 7.5 cuda 11.0
I tested all the examples using the make check command and followed INSTALL for all the commands. My test code is shown below:
Is there something wrong with my installation, or am I missing some steps?
Thanks for sending the running environment's information. I asked about how are you running it. Could you tell us the launch command you used to run this program? Also, if it is segfaulting then I'd like to see the stack trace (backtrace) to see where it is crashing.
Thank you very much for reminding me that my previous code should be running the wrong command(the segment fualt may also be caused by this). I used the make command in /split-cuda. But when I run make check command, I still get an error
But libcuda_wrappers.so exists in /split-cuda Do you know how to fix it?
@JaChouSSS Hi, I make the following steps: a. mkdir a new directory, b. copy this library to the new directory. c. open ~/.bashrc, append the new directory of absolutely path to to export LD_LIBRARY_PATH= XXXX d. source ~/.bashrc e. run your checkpoint command.
Hi, @JainTwinkle @JaChouSSS I got same output with JaChouSSS, do you know how to fix it? Thanks very much!
[ffk@g-v100-1-worker0001 test_gpu]$ CRAC-early-development-master/bin/dmtcp_launch --interval 3 ./test_cuda
[40000] WARNING at dlwrappers.cpp:72 in dlopen; REASON='JWARNING(false) failed'
filename = libcuda.so.1
flag = 2
[40000] WARNING at dlwrappers.cpp:80 in dlopen; REASON='JWARNING(false) failed'
filename =
flag = 2
[40000] NOTE at writeckpt.cpp:263 in mtcp_writememoryareas; REASON='before calling to skip'
(void *)area.addr = 0x400000
(void *)area.endAddr = 0x401000
area.size = 4096
Segmentation fault
Hi @JainTwinkle We've been trying to use CRAC but got stuck with the same issue -- it caused segfault when trying to locate "mmap" and "sbrk" symbol table from /lib64/ld-linux-x86-64.so.2 .
We are using Ubuntu 20.04.6 LTS.
Could you advise how to address this issue? Thanks!
I got a segment error when I tested my sample cuda application:
In addtion, I failed to test all the examples in /test, but DMTCP worked.
Can you help me with these questions?