Tencent / ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform
Other
20.38k stars 4.16k forks source link

用在Python程序中,Python运行结束时析构独显设备的VulkanDevice时释放dummy_image失败引发SIGSEGV #2666

Closed ArchieMeng closed 3 years ago

ArchieMeng commented 3 years ago

问题描述:

waifu2x-ncnn-vulkan-python(封装了waifu2x-ncnn-vulkan所以使用了ncnn)的样例程序在运行结束时,Waifu2x对象析构成功后,析构ncnn::g_default_vkdev的dummy_image的时候会发生Segment fault。在核显设备上(i5 1035G7 Iris Plus)不会有问题,但是在另一台独显设备上(1050Ti)会发生。(两台设备均为单GPU,也就是单核显和单独显)。另外,运行原版waifu2x-ncnn-vulkan程序的时候都没有问题。系统均为Arch linux

复现步骤:

1.编译waifu2x-ncnn-vulkan-python 2.到编译目录中运行waifu2x_ncnn_vulkan.py (如果程序中图片路径不对,就修改)

Backtrace Log的获取方式:

cd waifu2x-ncnn-vulkan-python/src/build
gdb python
(gdb) b Waifu2x::~Waifu2x
(gdb) run waifu2x_ncnn_vulkan.py

运行直至crash

GDB crash backtrace:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff282ab60 in ?? ()
(gdb) backtrace
#0  0x00007ffff282ab60 in ?? ()
#1  0x00007ffff69572ef in ncnn::VkBlobAllocator::fastFree (this=0x555555e2d400, ptr=0x555555e2ebe0)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/allocator.cpp:1045
#2  0x00007ffff6830b1d in ncnn::VkImageMat::release (this=0x555555c94d00) at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/mat.h:2217
#3  0x00007ffff6843830 in ncnn::VulkanDevicePrivate::destroy_dummy_buffer_image (this=0x555555c94bb0)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/gpu.cpp:1633
#4  0x00007ffff67bb4ca in ncnn::VulkanDevice::~VulkanDevice (this=<optimized out>, this=<optimized out>)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/gpu.cpp:2007
#5  0x00007ffff684341d in ncnn::destroy_gpu_instance () at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/gpu.cpp:1469
#6  0x00007ffff67b9b93 in ncnn::__ncnn_vulkan_instance_holder::~__ncnn_vulkan_instance_holder (this=<optimized out>, this=<optimized out>)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/gpu.cpp:50
#7  0x00007ffff7a45db7 in __run_exit_handlers () from /usr/lib/libc.so.6
#8  0x00007ffff7a45f5e in exit () from /usr/lib/libc.so.6
#9  0x00007ffff7a2e159 in __libc_start_main () from /usr/lib/libc.so.6
#10 0x000055555555504e in _start ()
nihui commented 3 years ago

那么,在退出前调用 ncnn::destroy_gpu_instance() 可以避免吗?

ArchieMeng commented 3 years ago

那么,在退出前调用 ncnn::destroy_gpu_instance() 可以避免吗?

这样的话,核显设备也Crash了。就是结束时(Python程序的末尾或者Waifu2x类以及派生类析构时调用ncnn::destroy_gpu_instance()),均会引发Crash.不过这种情况下的Backtrace就不一样了。Crash就发生在析构Waifu2x成员变量ncnn::Net net的过程中了。Waifu2xWrapped是Waifu2x的派生类。

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff2841a20 in ?? ()
(gdb) backtrace
#0  0x00007ffff2841a20 in ?? ()
#1  0x00007ffff69604a0 in ncnn::VkWeightAllocator::clear (this=0x55555799f1e0)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/allocator.cpp:1115
#2  0x00007ffff68170f4 in ncnn::VkWeightAllocator::~VkWeightAllocator (this=<optimized out>, this=<optimized out>)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/allocator.cpp:1087
#3  0x00007ffff696037e in ncnn::VkWeightAllocator::~VkWeightAllocator (this=<optimized out>, this=<optimized out>)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/allocator.cpp:1090
#4  0x00007ffff687c48a in ncnn::Net::clear (this=0x55555563a508) at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/net.cpp:2504
#5  0x00007ffff67ca516 in ncnn::Net::~Net (this=<optimized out>, this=<optimized out>)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/net.cpp:1729
#6  0x00007ffff67c137c in Waifu2x::~Waifu2x (this=<optimized out>, this=<optimized out>)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/waifu2x.cpp:25
#7  0x00007ffff67c0e44 in Waifu2xWrapped::~Waifu2xWrapped (this=<optimized out>, this=<optimized out>)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/./waifu2x_wrapped.h:22
#8  0x00007ffff682b49e in _wrap_delete_Waifu2xWrapped (args=0x7ffff6480db0)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/build/CMakeFiles/waifu2x_ncnn_vulkan_wrapper.dir/waifu2xPYTHON_wrap.cxx:4595
#9  0x00007ffff6825537 in SwigPyObject_dealloc (v=0x7ffff6480db0)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/build/CMakeFiles/waifu2x_ncnn_vulkan_wrapper.dir/waifu2xPYTHON_wrap.cxx:1573
#10 0x00007ffff7cfc286 in ?? () from /usr/lib/libpython3.9.so.1.0
#11 0x00007ffff7d31e83 in ?? () from /usr/lib/libpython3.9.so.1.0
ArchieMeng commented 3 years ago

我后来在Windows上成功编译了。Windows上倒没有这个问题。我开始怀疑是Nvidia在Linux上的驱动问题。将来如果拿到更多信息,我再reopen吧。