Open lforg37 opened 2 years ago
Hi @akasat Please help assign this issue properly. Not sure why you removed the assignment without assigning someone else?
I created a work-around for this in https://github.com/Xilinx/XRT/pull/6269
This is tracked internally with https://jira.xilinx.com/browse/CR-1120194 and there is a non-SYCL pure XRT & HLS reproducer example in https://jira.xilinx.com/browse/XRT-937
@sgundime-xilinx Identified the issue. Fix is in progress. The order of messageThread and unix_socket creation is updated. With this fix, we are not seeing any crash or segfault. Will create the PR shortly.
What was the PR fixing this?
The issue was resolved with an introduction of a monitoring flag which runs periodically. The read/write calls are protected with flag before really making calls. If any client/server gets disconnected then the thread gets notified with the flag. The CR-1120194 addressed this issue and resolved too.
PR: https://github.com/Xilinx/XRT/pull/6623
It seems that
unix_socket::sk_read
in runtime_src/core/pcie/emulation/common_em/unix_socket.cxx does not take into account the possibility of having less data on the socket than required.The
(r = read(fd, buf + rlen, count - rlen)) < 0
condition will never be reached if the socket is closed (0 would be assigned to r) producing an infinite loop.This behaviour has been observed on standard code. I have not found why the socket sometimes contains less information than expected. The same program can freeze or not depending on the execution so it seems there is a race condition here.
XRT version : 4c83637fd4d4041a5cd4872a1391f812e54e143e Alveo platform : xilinx_u200_gen3x16_xdma_1_202110_1
stack trace when blocked :