Xilinx / XRT

Run Time for AIE and FPGA based platforms
https://xilinx.github.io/XRT
Other
558 stars 473 forks source link

Thread-safety issue with enqueueMapBuffer() and parallel buffer allocation on Alveo boards #3953

Closed zohourih closed 4 years ago

zohourih commented 4 years ago

Trying to debug a "Bus error (core dumped)" crash on our Alveo U50 board, I realized that it seems there is some thread-safety issue with using the enqueueMapBuffer() API followed by OpenMP-parallelized buffer allocation using the host pointer, which is easily reproducible. Disabling OpenMP allows the code to run without any issues. The same code with OpenMP parallelization runs fine on MPSoC boards (which doesn't involve host to device buffer copy due to the memory being shared between the FPGA and the ARM CPU). It seems XRT fails to correctly handle the different OpenMP threads when explicitly copying data from the host buffer to the device across the PCI-E bus on Alveo boards.

To reproduce the issue, simply add #pragma omp parallel for to line 131 of the Vitis Accel vadd example here (or any other example that uses this type of buffer allocation/movement) and add -fopenmp to CXXFLAGS in the Makefile and compile it and run. This will reliably result in a "Bus error (core dumped)" crash. This seems like a serious thread-safety issue in XRT. based on the OpenCL specification, every OpenCL function except clSetKernelArg() are supposed to be thread-safe.

Tested with Ubuntu 18.04, Alveo U50, XRT 2.7.766 (taken from here), both Vitis 2019.2 and 2020.1.

stsoe commented 4 years ago

Thank you for this bug report. The issue is not OpenCL specific. It is fixed by #4297 .

zohourih commented 4 years ago

Thank you for the fix.