Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Segmentation fault in LookupBucketFor with a SYCL parallel reduction using pocl CUDA device #47489

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR48520
Status NEW
Importance P enhancement
Reported by Giuseppe Bilotta (giuseppe.bilotta@gmail.com)
Reported on 2020-12-15 11:24:44 -0800
Last modified on 2020-12-18 01:24:17 -0800
Version 11.0
Hardware PC Linux
CC llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
See also pocl issue #895 at https://github.com/pocl/pocl/issues/895

I am getting a segmentation fault in LLVM 11 (reproduced also in LLVM 10) from
Debian unstable when using pocl's CUDA device with an NVIDIA GPU to run a
simple parallel reduction implemented in SYCL and compiled with CodePlay's
ComputeCpp CE 2.3

The backtrace is the following:

#0  0x00007fffe9644636 in LookupBucketFor<llvm::Type const*> () at /build/llvm-
toolchain-11-eHqKZY/llvm-toolchain-11-11.0.0/llvm/include/llvm/ADT/DenseMap.h:613
#1  find () at /build/llvm-toolchain-11-eHqKZY/llvm-toolchain-11-
11.0.0/llvm/include/llvm/ADT/DenseMap.h:161
#2  getTypeID () at /build/llvm-toolchain-11-eHqKZY/llvm-toolchain-11-
11.0.0/llvm/lib/Bitcode/Writer/ValueEnumerator.h:166
#3  writeConstants () at /build/llvm-toolchain-11-eHqKZY/llvm-toolchain-11-
11.0.0/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:2545
#4  0x00007fffe9632622 in writeModuleConstants () at /build/llvm-toolchain-11-
eHqKZY/llvm-toolchain-11-11.0.0/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:2568
#5  write () at /build/llvm-toolchain-11-eHqKZY/llvm-toolchain-11-
11.0.0/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:4298
#6  0x00007fffe962dd3c in writeModule () at /build/llvm-toolchain-11-
eHqKZY/llvm-toolchain-11-11.0.0/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:4494
#7  0x00007fffe963803b in WriteBitcodeToFile () at /build/llvm-toolchain-11-
eHqKZY/llvm-toolchain-11-11.0.0/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:4520
#8  0x00007fffdfeecf0c in pocl_write_module (module=0x7ffff0002670,
path=0x7ffff7566ff0
"/home/oblomov/.cache/pocl/kcache/DD/ACFLBNIKIOELDBLLGLMICHPFMGGGHOPMOFLBA/SYCL_struct_vecinit/1024-1-1-goffs0-smallgrid/parallel.bc",
    dont_rewrite=0) at /home/oblomov/src/opencl/pocl/lib/llvmopencl/LLVMFileUtils.cc:421
#9  0x00007fffdfebe09b in pocl_cache_write_kernel_parallel_bc
(bc=0x7ffff0002670, program=0x43b2a0, device_i=0, kernel=0x1220780,
command=0x7ffff00020c0, specialize=1) at
/home/oblomov/src/opencl/pocl/lib/CL/pocl_cache.c:359
#10 0x00007fffdff4aa62 in pocl_llvm_generate_workgroup_function (DeviceI=0,
Device=0x779078, Kernel=0x1220780, Command=0x7ffff00020c0, Specialize=1) at
/home/oblomov/src/opencl/pocl/lib/CL/pocl_llvm_wg.cc:485
#11 0x00007fffdc0a0a66 in load_or_generate_kernel (kernel=0x1220780,
device=0x779078, has_offsets=0, device_i=0, command=0x7ffff00020c0,
specialized=1) at /home/oblomov/src/opencl/pocl/lib/CL/devices/cuda/pocl-
cuda.c:827
#12 0x00007fffdc0a104e in pocl_cuda_submit_kernel (stream=0xdc5a20,
cmd=0x7ffff00020c0, device=0x779078, event=0x7ffff00021a0) at
/home/oblomov/src/opencl/pocl/lib/CL/devices/cuda/pocl-cuda.c:913
#13 0x00007fffdc0a2207 in pocl_cuda_submit_node (node=0x7ffff00020c0,
cq=0x438f20, locked=1) at
/home/oblomov/src/opencl/pocl/lib/CL/devices/cuda/pocl-cuda.c:1256
#14 0x00007fffdc0a2530 in pocl_cuda_submit (node=0x7ffff00020c0, cq=0x438f20)
at /home/oblomov/src/opencl/pocl/lib/CL/devices/cuda/pocl-cuda.c:1320
#15 0x00007fffdfea8c84 in pocl_command_enqueue (command_queue=0x438f20,
node=0x7ffff00020c0) at /home/oblomov/src/opencl/pocl/lib/CL/pocl_util.c:576
#16 0x00007fffdfe89047 in POclEnqueueNDRangeKernel (command_queue=0x438f20,
kernel=0x1220780, work_dim=1, global_work_offset=0x7ffff7568b10,
global_work_size=0x7ffff7568af0, local_work_size=0x0, num_events_in_wait_list=0,
    event_wait_list=0x0, event=0x7ffff7568ad8) at /home/oblomov/src/opencl/pocl/lib/CL/clEnqueueNDRangeKernel.c:369
#17 0x00007ffff7c94ccb in
cl::sycl::detail::queue::enqueue_nd_range(std::shared_ptr<cl::sycl::detail::kernel>
const&, cl::sycl::detail::nd_range_base const&, int, std::vector<_cl_event*,
std::allocator<_cl_event*> > const&) ()
   from /opt/codeplay/computecpp/bin/../lib/libComputeCpp.so
#18 0x00007ffff7c7958b in
cl::sycl::detail::kernel_container::run(std::shared_ptr<cl::sycl::detail::queue>
const&, cl::sycl::detail::event_list*, bool) () from
/opt/codeplay/computecpp/bin/../lib/libComputeCpp.so
#19 0x00007ffff7c3b154 in
cl::sycl::detail::command::run_kernel(std::shared_ptr<cl::sycl::detail::queue>
const&, cl::sycl::detail::event_list*, bool) () from
/opt/codeplay/computecpp/bin/../lib/libComputeCpp.so
#20 0x00007ffff7ca4c60 in
cl::sycl::detail::transaction::commit(std::shared_ptr<cl::sycl::detail::queue>
const&) () from /opt/codeplay/computecpp/bin/../lib/libComputeCpp.so
#21 0x00007ffff7ca7f63 in
cl::sycl::detail::transaction::state<(cl::sycl::detail::trans_detail::status_t)3>::enter()
() from /opt/codeplay/computecpp/bin/../lib/libComputeCpp.so
#22 0x00007ffff7ca68d1 in
cl::sycl::detail::transaction::change_to(cl::sycl::detail::trans_detail::status_t,
cl::sycl::detail::trans_detail::status_t) () from
/opt/codeplay/computecpp/bin/../lib/libComputeCpp.so
#23 0x00007ffff7ca6feb in cl::sycl::detail::transaction::next_state() () from
/opt/codeplay/computecpp/bin/../lib/libComputeCpp.so
#24 0x00007ffff7c984d9 in cl::sycl::detail::scheduler::execute_front() () from
/opt/codeplay/computecpp/bin/../lib/libComputeCpp.so
#25 0x00007ffff7c988e1 in cl::sycl::detail::scheduler::scheduler_loop() () from
/opt/codeplay/computecpp/bin/../lib/libComputeCpp.so
#26 0x00007ffff7ef2120 in execute_native_thread_routine () from
/opt/codeplay/computecpp/bin/../lib/libComputeCpp.so
#27 0x00007ffff7580ea7 in start_thread (arg=<optimized out>) at
pthread_create.c:477
#28 0x00007ffff769fd8f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Quuxplusone commented 3 years ago

Some additional information:

I do understand that it's possible that the issue may be in the SPIR produced by the CodePlay compiler, but even if this is the case, I don't think the LLVM library should crash instead of throwing an appropriate error (assuming, of course, that the SPIR anomaly can be caught).