DMTCP-CRAC / CRAC-early-development

Other
22 stars 9 forks source link

Segment fault #7

Open JaChouSSS opened 2 years ago

JaChouSSS commented 2 years ago

I got a segment error when I tested my sample cuda application: segment_error

In addtion, I failed to test all the examples in /test, but DMTCP worked.

Can you help me with these questions?

JainTwinkle commented 2 years ago

@JaChouSSS Thanks for reporting the Github issue. The segmentation fault should create a core dump file (with ulimit -c unlimited). Could you please share the backtrace?

In addition, I failed to test all the examples in /test, but DMTCP worked.

You need to give more information on how you are launching and checkpointing a test case. So, we can help you fix the issue.

JaChouSSS commented 2 years ago

@JainTwinkle Thanks for your quick reply. The entire code is run in Docker and environment is as follows: Ubuntu 16.04 gcc 7.5 cuda 11.0

I tested all the examples using the make check command and followed INSTALL for all the commands. My test code is shown below:

1660747585817

Is there something wrong with my installation, or am I missing some steps?

JainTwinkle commented 2 years ago

Thanks for sending the running environment's information. I asked about how are you running it. Could you tell us the launch command you used to run this program? Also, if it is segfaulting then I'd like to see the stack trace (backtrace) to see where it is crashing.

JaChouSSS commented 2 years ago

Thank you very much for reminding me that my previous code should be running the wrong command(the segment fualt may also be caused by this). I used the make command in /split-cuda. But when I run make check command, I still get an error

25d7fa444292666bd565a075b633739

But libcuda_wrappers.so exists in /split-cuda Do you know how to fix it?

GoodKairos commented 1 year ago

@JaChouSSS Hi, I make the following steps: a. mkdir a new directory, b. copy this library to the new directory. c. open ~/.bashrc, append the new directory of absolutely path to to export LD_LIBRARY_PATH= XXXX d. source ~/.bashrc e. run your checkpoint command.

ffkjjj15 commented 1 year ago

Hi, @JainTwinkle @JaChouSSS I got same output with JaChouSSS, do you know how to fix it? Thanks very much!

[ffk@g-v100-1-worker0001 test_gpu]$ CRAC-early-development-master/bin/dmtcp_launch --interval 3 ./test_cuda
[40000] WARNING at dlwrappers.cpp:72 in dlopen; REASON='JWARNING(false) failed'
     filename = libcuda.so.1
     flag = 2
[40000] WARNING at dlwrappers.cpp:80 in dlopen; REASON='JWARNING(false) failed'
     filename =
     flag = 2
[40000] NOTE at writeckpt.cpp:263 in mtcp_writememoryareas; REASON='before calling to skip'
     (void *)area.addr = 0x400000
     (void *)area.endAddr = 0x401000
     area.size = 4096
Segmentation fault
tddg commented 11 months ago

Hi @JainTwinkle We've been trying to use CRAC but got stuck with the same issue -- it caused segfault when trying to locate "mmap" and "sbrk" symbol table from /lib64/ld-linux-x86-64.so.2 .

We are using Ubuntu 20.04.6 LTS.

Could you advise how to address this issue? Thanks!