Segmentation fault (Core dumped) when loading the compiled module fused

0sure commented 2 years ago

Hello, thank you for your valuable advice. Now, I have solved the compilation problem about the fused module under the environment of pytorch **1.7.1**. Now I've got the required .so file:

(test4torch) yuanWang@server-TiTan:~/.cache/torch_extensions/fused$ ls
build.ninja  fused_bias_act_kernel.cuda.o  fused_bias_act.o  fused.so

I encountered a new problem when I tried to use _import_module_from_libraryto load the compiled module fused, I changed the beginning code in fused_act.py to:

try:
    user_home_path = os.path.expanduser('~')
    fused = _import_module_from_library('fused', user_home_path+'/.cache/torch_extensions/fused', True)
    print(f'Load fused from {user_home_path}/.cache/torch_extensions/fused')
    print("Load success!")
except:
    module_path = os.path.dirname(__file__)
    fused = load(
        name='fused',
        sources=[
            os.path.join(module_path, 'fused_bias_act.cpp'),
            os.path.join(module_path, 'fused_bias_act_kernel.cu'),
        ],
        verbose=True
    )
    print(f'Load function used. Build fused from cpp & cu files')

Executing the code step by step will not report an error, but when I type quit (), the command line will report an error Segmentation fault (Core dumped). My gdb traceback content shows:

(gdb) run fused_act.py
Starting program: /data4/yuanWang/anaconda3/envs/test4torch/bin/python fused_act.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fff7d446700 (LWP 4153)]
Load fused from /data4/yuanWang/.cache/torch_extensions/fused
Load success!

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffdeafa9b8 in ?? ()
   from /data4/yuanWang/anaconda3/envs/test4torch/lib/python3.7/site-packages/torch/lib/../../../../libcudart.so.10.1
(gdb) backtrace
#0  0x00007fffdeafa9b8 in ?? ()
   from /data4/yuanWang/anaconda3/envs/test4torch/lib/python3.7/site-packages/torch/lib/../../../../libcudart.so.10.1
#1  0x00007fffdeafb1a3 in ?? ()
   from /data4/yuanWang/anaconda3/envs/test4torch/lib/python3.7/site-packages/torch/lib/../../../../libcudart.so.10.1
#2  0x00007fffdeafb8a5 in ?? ()
   from /data4/yuanWang/anaconda3/envs/test4torch/lib/python3.7/site-packages/torch/lib/../../../../libcudart.so.10.1
#3  0x00007ffff7806031 in __run_exit_handlers (status=0, 
    listp=0x7ffff7bae718 <__exit_funcs>, 
    run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true)
    at exit.c:108
#4  0x00007ffff780612a in __GI_exit (status=<optimized out>) at exit.c:139
#5  0x00007ffff77e4c8e in __libc_start_main (main=0x555555645ab0 <main>, argc=2, 
    argv=0x7fffffffe158, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffe148)
    at ../csu/libc-start.c:344
#6  0x000055555572b73d in _start () at ../sysdeps/x86_64/elf/start.S:103

Do you know how to solve this problem? By the way, do you choose to use Windows as the operating system to run code? I strongly suspect that this problem is related to the Ubuntu operating system. Thank you for your time. Best wishes.

PeterWang512 commented 2 years ago

I am using ubuntu for this project. At this moment, I am not sure what's the cause of this issue, but I'll look into it.

0sure commented 2 years ago

I am using ubuntu for this project. At this moment, I am not sure what's the cause of this issue, but I'll look into it.

Thank you very much! I am very interested in your project and spend hours studying it every day. I look forward to using your training code to help me train my own generate model as soon as possible!

PeterWang512 / GANSketching

Segmentation fault (Core dumped) when loading the compiled module fused #17