UCLA-VAST / AutoSA

AutoSA: Polyhedral-Based Systolic Array Compiler
MIT License
191 stars 31 forks source link

Deadlock in large size GEMM SA #4

Closed hecmay closed 3 years ago

hecmay commented 3 years ago

I am able to compile and run small-size GEMM with AutoSA. However, for a large size GEMM, I kept running into deadlock in both emulation and real hardware.

(ubuntu) [sx233@brg-zhang-xcel temp.autosa.large-u280-2020.2]$ XCL_EMULATION_MODE=sw_emu ./host build_dir.sw_emu.xilinx_u280_xdma_201920_3/kernel.xclbin

Found Platform

Platform Name: XilinxFound Device=xilinx_u280_xdma_201920_3
 Loading: build_dir.sw_emu.xilinx_u280_xdma_201920_3/kernel.xclbin
whbldhwj commented 3 years ago

Can you let me know the command you are using when generating the array? I can help run one on our U280 and test the issue.

hecmay commented 3 years ago

@whbldhwj Thanks! I was using this example with exactly the same command. https://github.com/UCLA-VAST/AutoSA/tree/master/autosa_tests/large/mm

I used my own Makfile, which is missing a few options compared with the Makefile provided by AutoSA, but I guess this is fine? I can try one more time using AutoSA's makefile and connectivity.cfg

whbldhwj commented 3 years ago

I see. I thought that design can't be mapped onto U280. That design takes 8320 DSPs, resulting in 8320/9024=92% DSP usage. Usually, when DSP usage is over 70%, Vitis will fail the routing. Did you successfully route the design? Which version of Vitis are you using? I'll try this one later today as well.

hecmay commented 3 years ago

I tried the large mm example (floating point, SIMD=1) on U280. It is runnable. I got the device execution time = 0.026s.

The hcl large mm (int) example still got stuck on hardware... I am currently trying the floating point version and see if it can work.

hecmay commented 3 years ago

@whbldhwj FP version of HCL MM still has deadlock for large problem size. In the emulation, I can see the warning msg returned by Vitis (I am using 2020.2).