Xilinx / mlir-aie

An MLIR-based toolchain for AMD AI Engine-enabled devices.
Other
288 stars 82 forks source link

All example kernels execution result in `ERT_CMD_STATE_ERROR` on the Windows host #1757

Open fxmarty-amd opened 2 weeks ago

fxmarty-amd commented 2 weeks ago

Hi,

I have been trying to use MLIR-AIE for Ryzen AI NPU on a Windows laptop, with WSL Ubuntu 22.04, without success so far using the examples in https://github.com/Xilinx/mlir-aie/tree/main/programming_examples/basic.

Building the AIE design in WSL works, it is the execution of the resulting .exe on Windows side that is failing, specifically the error code from kernel.wait(...) is always 4 i.e. ERT_CMD_STATE_ERROR, for example in https://github.com/Xilinx/mlir-aie/blob/fe0c224fb68170d688075902cf8266ead1e1cdcd/programming_examples/basic/passthrough_dmas/test.cpp#L170.

I have tried installing mlir-aie both from the provided wheels or from source (llvm + mlir-aie), none of them being successful. I am using Vitis 2023.2, and used the following install instructions: https://github.com/Xilinx/mlir-aie/blob/main/docs/buildHostWin.md (wheel) & https://github.com/Xilinx/mlir-aie/blob/main/docs/Building.md (from source)

Concerning the driver used, running (Get-WmiObject -Class Win32_PnPSignedDriver | Where-Object { $_.DeviceName -eq "AMD IPU Device" }).DriverVersion in powershell gives 10.106.8.62.

Something that I found doubtful was that running gendef xrt_coreutil.dll in /mnt/c/Technical/xrtNPUfromDLL gives the logs

 * [xrt_coreutil.dll] Found PE+ image
 *** get_primary_data_type '$$Q' unknown

but a xrt_coreutil.def file is nonetheless created.

Something to note as well is that running mlir-aie through riallto wrapper (following https://riallto.ai/install-riallto-windows.html#install-riallto-windows notebook) did work, so I guess it is not a BIOS/secure boot issue.

On the Windows host side, I use e.g. the following

cd buildMSVS
cmake .. -G "Visual Studio 17 2022" -A x64 -DTARGET_NAME=passThroughDMAs
cmake --build . --config Release
.\passThroughDMAs.exe -x ..\build\final.xclbin -i ..\build\insts.txt -k MLIR_AIE -l 4096

where printing the result code gives result code: 5, and the following check fails. The output matrix is not updated, as if the kernel did not run (which I suppose from the error).

Would you have any idea where the issue may be coming from, or information I could share concerning my setup? cc @stephenneuendorffer @fifield @jgmelber Maybe something more that needs to be done than Clone https://github.com/Xilinx/XRT for instance under C:\Technical and git checkout 2023.2?

Thank you!

fxmarty-amd commented 2 weeks ago

Welcome to any suggestion!