Xilinx / XRT

Run Time for AIE and FPGA based platforms
https://xilinx.github.io/XRT
Other
545 stars 460 forks source link

XRT stay stuck while destroying an xrt::bo #6588

Open Ralender opened 2 years ago

Ralender commented 2 years ago

The provided test case is mostly automatically generated from an issue found in our runtime build on top of XRT.

steps to reproduce:

> export XCL_EMULATION_MODE=hw_emu
> unzip reprod.zip
> clang++ -o reprod host.cpp -g -I/opt/xilinx/xrt/include -L/opt/xilinx/xrt/lib -lOpenCL -luuid -lxrt_coreutil
> ./reprod
this will stay stuck after printing PASS
while trying to destructing an xrt::bo

here is the host.cpp and the device.xclbin. reprod.zip

the test is a simple vector add. here is the host side also contained in reprod.zip:

#include <xrt/xrt_kernel.h>
#include <xrt.h>
#include <array>

int main() {

/// above is not automaticaly generated

// from: _pi_platform::_pi_platform(unsigned int) sycl/plugins/xrt/pi_xrt.cpp:473
// call str:xrt::device(i)
auto name0 = xrt::device(((unsigned int)0));

// xclbin buffer size=20946875
// from: pi_result xrt_piProgramCreateWithBinary(pi_context, pi_uint32, const pi_device *, const size_t *, const unsigned char **, size_t, const pi_device_binary_property *, pi_int32 *, pi_program *) sycl/plugins/xrt/pi_xrt.cpp:1908
// call str:xrt::xclbin(reinterpret_cast<const axlf *>(binaries[0]))
auto name1 = xrt::xclbin("device.xclbin");

// from: pi_result xrt_piProgramCreateWithBinary(pi_context, pi_uint32, const pi_device *, const size_t *, const unsigned char **, size_t, const pi_device_binary_property *, pi_int32 *, pi_program *) sycl/plugins/xrt/pi_xrt.cpp:1909
// call str:dev->get().load_xclbin(xclbin)
auto name2 = name0.load_xclbin(name1);

// from: pi_result xrt_piKernelCreate(pi_program, const char *, pi_kernel *) sycl/plugins/xrt/pi_xrt.cpp:1755
// call str:xrt::kernel(program->device_->get(), program->uuid_, kernel_name)
auto name3 = xrt::kernel(name0, name2, "dlerEE_clES2_E6Kernel_0i996ruy");

// from: pi_result xrt_piKernelCreate(pi_program, const char *, pi_kernel *) sycl/plugins/xrt/pi_xrt.cpp:1756
// call str:program->bin_.get_kernel(kernel_name)
auto name4 = name1.get_kernel("dlerEE_clES2_E6Kernel_0i996ruy");

// from: pi_result xrt_piKernelCreate(pi_program, const char *, pi_kernel *) sycl/plugins/xrt/pi_xrt.cpp:1757
// call str:xrt::run(ker)
auto name5 = xrt::run(name3);

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1771
std::array<char, 4> name6= {'\x04', '\x00', '\x00', '\x00'};

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1772
// call str:kernel->run_.set_arg(arg_index, arg_value, arg_size)
name5.set_arg(((unsigned int)0), ((const void*)name6.data()), ((unsigned long)4));

// from: void _pi_mem::map_if_needed(const xrt::device &) sycl/plugins/xrt/pi_xrt.cpp:572
// call str:xrt::bo(device, mem.size, XRT_BO_FLAGS_NONE, 0)
auto name7 = xrt::bo(name0, ((unsigned long)16), ((int)0), ((int)0));

// from: void _pi_mem::map_if_needed(const xrt::device &) sycl/plugins/xrt/pi_xrt.cpp:573
// call str:buffer_.map()
auto name8 = name7.map();

// from: pi_result xrt_piextKernelSetArgMemObj(pi_kernel, pi_uint32, const pi_mem *) sycl/plugins/xrt/pi_xrt.cpp:1784
// call str:kernel->run_.set_arg(arg_index, buf->buffer_)
name5.set_arg(((unsigned int)1), name7);

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1771
std::array<char, 8> name9= {'\x04', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00'};

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1772
// call str:kernel->run_.set_arg(arg_index, arg_value, arg_size)
name5.set_arg(((unsigned int)2), ((const void*)name9.data()), ((unsigned long)8));

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1771
std::array<char, 8> name10= {'\x04', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00'};

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1772
// call str:kernel->run_.set_arg(arg_index, arg_value, arg_size)
name5.set_arg(((unsigned int)3), ((const void*)name10.data()), ((unsigned long)8));

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1771
std::array<char, 8> name11= {'\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00'};

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1772
// call str:kernel->run_.set_arg(arg_index, arg_value, arg_size)
name5.set_arg(((unsigned int)4), ((const void*)name11.data()), ((unsigned long)8));

// from: void _pi_mem::map_if_needed(const xrt::device &) sycl/plugins/xrt/pi_xrt.cpp:572
// call str:xrt::bo(device, mem.size, XRT_BO_FLAGS_NONE, 0)
auto name12 = xrt::bo(name0, ((unsigned long)16), ((int)0), ((int)0));

// from: void _pi_mem::map_if_needed(const xrt::device &) sycl/plugins/xrt/pi_xrt.cpp:573
// call str:buffer_.map()
auto name13 = name12.map();

// from: auto xrt_piEnqueueMemBufferWrite(pi_queue, pi_mem, pi_bool, size_t, size_t, const void *, pi_uint32, const pi_event *, pi_event *)::(anonymous class)::operator()() const sycl/plugins/xrt/pi_xrt.cpp:1713
std::array<char, 16> name14= {'\x00', '\x00', '\x00', '\x00', '\x01', '\x00', '\x00', '\x00', '\x02', '\x00', '\x00', '\x00', '\x03', '\x00', '\x00', '\x00'};

// from: auto xrt_piEnqueueMemBufferWrite(pi_queue, pi_mem, pi_bool, size_t, size_t, const void *, pi_uint32, const pi_event *, pi_event *)::(anonymous class)::operator()() const sycl/plugins/xrt/pi_xrt.cpp:1714
// call str:(void)std::memcpy(adjusted_ptr, ptr, size)
(void)std::memcpy(name13, ((const void*)name14.data()), ((unsigned long)16));

// from: auto xrt_piEnqueueMemBufferWrite(pi_queue, pi_mem, pi_bool, size_t, size_t, const void *, pi_uint32, const pi_event *, pi_event *)::(anonymous class)::operator()() const sycl/plugins/xrt/pi_xrt.cpp:1715
// call str:buffer->buffer_.sync(XCL_BO_SYNC_BO_TO_DEVICE)
name12.sync(((xclBOSyncDirection)0));

// from: pi_result xrt_piextKernelSetArgMemObj(pi_kernel, pi_uint32, const pi_mem *) sycl/plugins/xrt/pi_xrt.cpp:1784
// call str:kernel->run_.set_arg(arg_index, buf->buffer_)
name5.set_arg(((unsigned int)5), name12);

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1771
std::array<char, 8> name15= {'\x04', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00'};

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1772
// call str:kernel->run_.set_arg(arg_index, arg_value, arg_size)
name5.set_arg(((unsigned int)6), ((const void*)name15.data()), ((unsigned long)8));

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1771
std::array<char, 8> name16= {'\x04', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00'};

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1772
// call str:kernel->run_.set_arg(arg_index, arg_value, arg_size)
name5.set_arg(((unsigned int)7), ((const void*)name16.data()), ((unsigned long)8));

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1771
std::array<char, 8> name17= {'\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00'};

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1772
// call str:kernel->run_.set_arg(arg_index, arg_value, arg_size)
name5.set_arg(((unsigned int)8), ((const void*)name17.data()), ((unsigned long)8));

// from: void _pi_mem::map_if_needed(const xrt::device &) sycl/plugins/xrt/pi_xrt.cpp:572
// call str:xrt::bo(device, mem.size, XRT_BO_FLAGS_NONE, 0)
auto name18 = xrt::bo(name0, ((unsigned long)16), ((int)0), ((int)0));

// from: void _pi_mem::map_if_needed(const xrt::device &) sycl/plugins/xrt/pi_xrt.cpp:573
// call str:buffer_.map()
auto name19 = name18.map();

// from: auto xrt_piEnqueueMemBufferWrite(pi_queue, pi_mem, pi_bool, size_t, size_t, const void *, pi_uint32, const pi_event *, pi_event *)::(anonymous class)::operator()() const sycl/plugins/xrt/pi_xrt.cpp:1713
std::array<char, 16> name20= {'\x01', '\x00', '\x00', '\x00', '\x02', '\x00', '\x00', '\x00', '\x03', '\x00', '\x00', '\x00', '\x04', '\x00', '\x00', '\x00'};

// from: auto xrt_piEnqueueMemBufferWrite(pi_queue, pi_mem, pi_bool, size_t, size_t, const void *, pi_uint32, const pi_event *, pi_event *)::(anonymous class)::operator()() const sycl/plugins/xrt/pi_xrt.cpp:1714
// call str:(void)std::memcpy(adjusted_ptr, ptr, size)
(void)std::memcpy(name19, ((const void*)name20.data()), ((unsigned long)16));

// from: auto xrt_piEnqueueMemBufferWrite(pi_queue, pi_mem, pi_bool, size_t, size_t, const void *, pi_uint32, const pi_event *, pi_event *)::(anonymous class)::operator()() const sycl/plugins/xrt/pi_xrt.cpp:1715
// call str:buffer->buffer_.sync(XCL_BO_SYNC_BO_TO_DEVICE)
name18.sync(((xclBOSyncDirection)0));

// from: pi_result xrt_piextKernelSetArgMemObj(pi_kernel, pi_uint32, const pi_mem *) sycl/plugins/xrt/pi_xrt.cpp:1784
// call str:kernel->run_.set_arg(arg_index, buf->buffer_)
name5.set_arg(((unsigned int)9), name18);

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1771
std::array<char, 8> name21= {'\x04', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00'};

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1772
// call str:kernel->run_.set_arg(arg_index, arg_value, arg_size)
name5.set_arg(((unsigned int)10), ((const void*)name21.data()), ((unsigned long)8));

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1771
std::array<char, 8> name22= {'\x04', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00'};

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1772
// call str:kernel->run_.set_arg(arg_index, arg_value, arg_size)
name5.set_arg(((unsigned int)11), ((const void*)name22.data()), ((unsigned long)8));

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1771
std::array<char, 8> name23= {'\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00'};

// from: pi_result xrt_piKernelSetArg(pi_kernel, pi_uint32, size_t, const void *) sycl/plugins/xrt/pi_xrt.cpp:1772
// call str:kernel->run_.set_arg(arg_index, arg_value, arg_size)
name5.set_arg(((unsigned int)12), ((const void*)name23.data()), ((unsigned long)8));

// from: pi_result xrt_piEnqueueKernelLaunch(pi_queue, pi_kernel, pi_uint32, const size_t *, const size_t *, const size_t *, pi_uint32, const pi_event *, pi_event *) sycl/plugins/xrt/pi_xrt.cpp:1813
// call str:kernel->run_.start()
name5.start();

// from: pi_result xrt_piEnqueueKernelLaunch(pi_queue, pi_kernel, pi_uint32, const size_t *, const size_t *, const size_t *, pi_uint32, const pi_event *, pi_event *) sycl/plugins/xrt/pi_xrt.cpp:1814
// call str:kernel->run_.wait()
name5.wait();

// from: pi_result xrt_piEnqueueMemBufferRead(pi_queue, pi_mem, pi_bool, size_t, size_t, void *, pi_uint32, const pi_event *, pi_event *) sycl/plugins/xrt/pi_xrt.cpp:1734
// call str:buffer->buffer_.sync(XCL_BO_SYNC_BO_FROM_DEVICE)
name7.sync(((xclBOSyncDirection)1));

// from: pi_result xrt_piEnqueueMemBufferRead(pi_queue, pi_mem, pi_bool, size_t, size_t, void *, pi_uint32, const pi_event *, pi_event *) sycl/plugins/xrt/pi_xrt.cpp:1736
std::array<char, 16> name24= {'\x10', '\xb4', '\x08', '\x02', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00'};

// from: pi_result xrt_piEnqueueMemBufferRead(pi_queue, pi_mem, pi_bool, size_t, size_t, void *, pi_uint32, const pi_event *, pi_event *) sycl/plugins/xrt/pi_xrt.cpp:1737
// call str:(void)std::memcpy(ptr, adjusted_ptr, size)
(void)std::memcpy(((void*)name24.data()), name8, ((unsigned long)16));

/// below is not automaticaly generated
int *a_c = ((int *)name24.data());
for (int i = 0; i < 4; i++) {
int res = i + i + 1;
int val = a_c[i];
assert(val == res);
}
printf("PASS\n");
}
keryell commented 2 years ago

I have just tried on my Ubuntu 21.10 laptop:

./reprod 
WARNING: XCLBIN used is generated with Vivado version 2022.1.0 where as it is run with the Vivado version 2021.2 which is not compatible. May result to weird behaviour.
INFO: [HW-EMU 07-0] Please refer the path "/tmp/.run/1735899/hw_em/device0/binary_0/behav_waveform/xsim/simulate.log" for more detailed simulation infos, errors and warnings.
INFO: [Common 17-206] Exiting xsim at Tue Apr 12 14:41:47 2022...
SIMULATION EXITED
# around 5 min later:
segmentation fault (core dumped)  ./reprod

so I guess there is is a bug.

keryell commented 2 years ago

@Ralender I tried to compile your host code with g++ but it did not work because of lacking

#include <cassert>
#include <cstring>

Can you fix your source code generator? With these lines

g++ -o reprod host.cpp -g -I/opt/xilinx/xrt/include -L/opt/xilinx/xrt/lib -lOpenCL -luuid -lxrt_coreutil

compiles and execution fails the same way:

-<%>- time ./reprod                                                                                           
WARNING: XCLBIN used is generated with Vivado version 2022.1.0 where as it is run with the Vivado version 2021.2 which is not compatible. May result to weird behaviour.
INFO: [HW-EMU 07-0] Please refer the path "/tmp/.run/1736560/hw_em/device0/binary_0/behav_waveform/xsim/simulate.log" for more detailed simulation infos, errors and warnings.
INFO: [Common 17-206] Exiting xsim at Tue Apr 12 14:50:50 2022...
SIMULATION EXITED
[2]    1736560 segmentation fault (core dumped)  ./reprod
./reprod  0.56s user 0.09s system 0% cpu 5:00.12 total

Note that I have added a time command so you can see it fails in exactly 5 minutes! :-)

I am afraid that if this bug is not fixed, hw_emu will not work with the next Ubuntu LTS. :-(

venkatp-xilinx commented 2 years ago

@stsoe , @keryell : Can we have a CR for this against Vitis_Emulation application with the details on how to setup the "clang" to compile the host and reproduce the issue.