CHIP-SPV / chipStar

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
Other
219 stars 32 forks source link

hipMemcpyAsync Multi Threaded double free or corruption #815

Closed pvelesko closed 6 months ago

pvelesko commented 6 months ago
The following tests FAILED:
    516 - Unit_hipMemcpy_MultiThread-AllAPIs (SEGFAULT)
    602 - Unit_hipMemcpyWithStream_MultiThread (Subprocess aborted)
    639 - Unit_hipMemcpyAsync_hipMultiMemcpyMultiThread - int (SEGFAULT)
    640 - Unit_hipMemcpyAsync_hipMultiMemcpyMultiThread - float (SEGFAULT)
    642 - Unit_hipMemcpyAsync_hipMultiMemcpyMultiThreadMultiStream - int (SEGFAULT)
    643 - Unit_hipMemcpyAsync_hipMultiMemcpyMultiThreadMultiStream - float (SEGFAULT)

Seems to be related to lazy jit?

Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=126992568259520) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x737fc2eb93c0 (LWP 3261184))]
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=126992568259520) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=126992568259520) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=126992568259520, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x0000737fc2842476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x0000737fc28287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000737fc2889676 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x737fc29dbb77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x0000737fc28a0cfc in malloc_printerr (str=str@entry=0x737fc29de748 "double free or corruption (fasttop)") at ./malloc/malloc.c:5664
#7  0x0000737fc28a29ca in _int_free (av=0x737fa4000030, p=0x737fa401a8b0, have_lock=0) at ./malloc/malloc.c:4539
#8  0x0000737fc28a5453 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#9  0x0000737fc2fa5268 in std::__new_allocator<std::__detail::_Hash_node<std::pair<void const* const, chipstar::Module*>, false> >::deallocate (this=0x575d67f2e950, __p=0x31c300, __n=1)
    at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/new_allocator.h:158
#10 std::allocator_traits<std::allocator<std::__detail::_Hash_node<std::pair<void const* const, chipstar::Module*>, false> > >::deallocate (__a=..., __p=0x31c300, __n=1)
    at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/alloc_traits.h:496
#11 std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<void const* const, chipstar::Module*>, false> > >::_M_deallocate_node_ptr (this=0x575d67f2e950, __n=0x31c300)
    at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/hashtable_policy.h:1995
#12 std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<void const* const, chipstar::Module*>, false> > >::_M_deallocate_node (this=0x575d67f2e950, __n=0x31c300)
    at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/hashtable_policy.h:1985
#13 std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<void const* const, chipstar::Module*>, false> > >::_M_deallocate_nodes (this=0x575d67f2e950, __n=0x737893fa516a)
    at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/hashtable_policy.h:2006
#14 std::_Hashtable<void const*, std::pair<void const* const, chipstar::Module*>, std::allocator<std::pair<void const* const, chipstar::Module*> >, std::__detail::_Select1st, std::equal_to<void const*>, std::hash<void const*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::clear (this=0x575d67f2e950)
    at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/hashtable.h:2500
#15 std::_Hashtable<void const*, std::pair<void const* const, chipstar::Module*>, std::allocator<std::pair<void const* const, chipstar::Module*> >, std::__detail::_Select1st, std::equal_to<void const*>, std::hash<void const*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::~_Hashtable (this=0x575d67f2e950)
    at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/hashtable.h:1584
#16 std::unordered_map<void const*, chipstar::Module*, std::hash<void const*>, std::equal_to<void const*>, std::allocator<std::pair<void const* const, chipstar::Module*> > >::~unordered_map (this=0x575d67f2e950)
    at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/unordered_map.h:102
#17 chipstar::Device::~Device (this=0x575d67f2e910) at /space/pvelesko/chipStar/main/src/CHIPBackend.cc:545
#18 0x0000737fc3067219 in CHIPDeviceLevel0::~CHIPDeviceLevel0 (this=0x31c300) at /space/pvelesko/chipStar/main/src/backend/Level0/CHIPBackendLevel0.hh:527
#19 0x0000737fc305ab06 in CHIPContextLevel0::~CHIPContextLevel0 (this=0x575d67f2e780) at /space/pvelesko/chipStar/main/src/backend/Level0/CHIPBackendLevel0.cc:1971
#20 0x0000737fc305ab99 in CHIPContextLevel0::~CHIPContextLevel0 (this=0x31c300) at /space/pvelesko/chipStar/main/src/backend/Level0/CHIPBackendLevel0.cc:1943
#21 0x0000737fc2faafc0 in chipstar::Backend::~Backend (this=0x575d66988da0) at /space/pvelesko/chipStar/main/src/CHIPBackend.cc:1184
#22 0x0000737fc30670e9 in CHIPBackendLevel0::~CHIPBackendLevel0 (this=0x31c300) at /space/pvelesko/chipStar/main/src/backend/Level0/CHIPBackendLevel0.hh:622
#23 0x0000737fc2f9bc27 in CHIPUninitializeCallOnce () at /space/pvelesko/chipStar/main/src/CHIPDriver.cc:146
#24 0x0000737fc2899ee8 in __pthread_once_slow (once_control=0x737fc30c8528 <Uninitialized>, init_routine=0x737fc2cdad50 <__once_proxy>) at ./nptl/pthread_once.c:116
#25 0x0000737fc2f9b918 in __gthread_once (__once=0x31c300, __func=0x31c300) at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/gthr-default.h:700
#26 std::call_once<void (*)()> (__once=..., __f=@0x7ffc88b580c8: 0x737fc2f9bb50 <CHIPUninitializeCallOnce()>) at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/mutex:859
#27 CHIPUninitialize () at /space/pvelesko/chipStar/main/src/CHIPDriver.cc:152
#28 0x0000737fc30001ed in __hipUnregisterFatBinary (Data=0x575d67f3b2e0) at /space/pvelesko/chipStar/main/src/CHIPBindings.cc:4505
#29 0x0000575d6617d982 in __hip_module_dtor ()
#30 0x0000737fc2845495 in __run_exit_handlers (status=0, listp=0x737fc2a1a838
pvelesko commented 6 months ago

Fixed in #817