Xilinx / mlir-aie

An MLIR-based toolchain for AMD AI Engine-enabled devices.
Other
293 stars 84 forks source link

post-build checks are not compiling #923

Closed medbzkst closed 9 months ago

medbzkst commented 9 months ago

I tried to launch check-aie and check-tutorials after finishing the build, and several errors occurred.

I tried then to compile one tutorial manually, tutorial-1 actually. I have got in result this example error

aiecc.py -j4 --sysroot=/home/bouazim/tools/Xilinx/Vitis/2023.2/gnu/aarch64/lin/aarch64-linux/aarch64-xilinx-linux/ --host-target=aarch64-linux-gnu aie.mlir -I/home/bouazim/mlir-aie/build_release/runtime_lib/aarch64/test_lib/include -I/home/bouazim/mlir-aie/build_release/runtime_lib/aarch64/xaiengine/include --gcc-toolchain=/home/bouazim/tools/Xilinx/Vitis/2023.2/gnu/aarch64/lin/aarch64-linux/aarch64-xilinx-linux//usr -L/home/bouazim/mlir-aie/build_release/runtime_lib/aarch64/test_lib/lib -ltest_lib  -I/home/bouazim/tools/Xilinx/Vitis/2023.2/gnu/aarch64/lin/aarch64-linux/aarch64-xilinx-linux//usr/include/c++/12.2.0 -I/home/bouazim/tools/Xilinx/Vitis/2023.2/gnu/aarch64/lin/aarch64-linux/aarch64-xilinx-linux//usr/include/c++/12.2.0/aarch64-xilinx-linux -I/home/bouazim/tools/Xilinx/Vitis/2023.2/gnu/aarch64/lin/aarch64-linux/aarch64-xilinx-linux//usr/include -L/home/bouazim/tools/Xilinx/Vitis/2023.2/gnu/aarch64/lin/aarch64-linux/aarch64-xilinx-linux//usr/lib/aarch64-xilinx-linux/12.2.0 -B/home/bouazim/tools/Xilinx/Vitis/2023.2/gnu/aarch64/lin/aarch64-linux/aarch64-xilinx-linux//usr/lib/aarch64-xilinx-linux/12.2.0 ./test.cpp -o tutorial-1.exe
Found Vitis at /home/bouazim/tools/Xilinx/Vitis/2023.2
 AIE Compilation:     0% -:--:-- 0:00:00 0/2 4 Workers

clang++: error: no such file or directory: '/home/bouazim/mlir-aie/build_release/runtime_lib/aarch64/test_lib/lib/libmemory_allocator_ion.a'

Here is also a few sample lines from the check-aie

[90/91] Running the aie regression tests
xrt not found
Peano not found, but expected at  <unset>/bin
Looking for Chess...
Chess found: /home/bouazim/tools/Xilinx/Vitis/2023.2/aietools/bin/xchesscc
lit: /home/bouazim/.local/lib/python3.8/site-packages/lit/llvm/subst.py:126: note: Did not find ld.lld in /home/bouazim/mlir-aie/build_release/bin:/home/bouazim/mlir-aie/mlir/bin
lit: /home/bouazim/.local/lib/python3.8/site-packages/lit/main.py:71: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 600 seconds was requested on the command line. Forcing timeout to be 600 seconds
XPASS: AIE :: unit_tests/aie/31_stream_core/aie.mlir (1 of 760)
******************** TEST 'AIE :: unit_tests/aie/31_stream_core/aie.mlir' FAILED ********************
Script:
--
: 'RUN: at line 11';   /usr/bin/python3.8 /home/bouazim/mlir-aie/build_release/bin/aiecc.py  --host-target=x86_64-unknown-linux-gnu /home/bouazim/mlir-aie/test/unit_tests/aie/31_stream_core/aie.mlir -I/home/bouazim/mlir-aie/build_release/runtime_lib/x86_64/test_lib/include -L/home/bouazim/mlir-aie/build_release/runtime_lib/x86_64/test_lib/lib -ltest_lib /home/bouazim/mlir-aie/test/unit_tests/aie/31_stream_core/test.cpp -o test.elf
: 'RUN: at line 12';   echo ./test.elf
--
Exit Code: 0

Command Output (stdout):
--
 AIE Compilation:  100% 0:00:00 0:00:00 3/3 4 Workers
./test.elf

--

********************
FAIL: AIE :: Targets/AIEGenerateCDODirect/07_shim_dma_core_function_with_loop.mlir (2 of 760)
******************** TEST 'AIE :: Targets/AIEGenerateCDODirect/07_shim_dma_core_function_with_loop.mlir' FAILED ********************
Script:
--
: 'RUN: at line 6';   export BASENAME=$(basename /home/bouazim/mlir-aie/test/Targets/AIEGenerateCDODirect/07_shim_dma_core_function_with_loop.mlir)
: 'RUN: at line 7';   rm -rf *.elf* *.xclbin *.bin $BASENAME.cdo_direct $BASENAME.prj
: 'RUN: at line 8';   mkdir $BASENAME.prj && pushd $BASENAME.prj && "/usr/bin/python3.8" /home/bouazim/mlir-aie/build_release/bin/aiecc.py --aie-generate-cdo --no-compile-host --tmpdir $PWD /home/bouazim/mlir-aie/test/Targets/AIEGenerateCDODirect/07_shim_dma_core_function_with_loop.mlir && popd
: 'RUN: at line 9';   mkdir $BASENAME.cdo_direct && cp $BASENAME.prj/*.elf $BASENAME.cdo_direct
: 'RUN: at line 10';   /home/bouazim/mlir-aie/build_release/bin/aie-translate --aie-generate-cdo-direct $BASENAME.prj/input_physical.mlir --work-dir-path=$BASENAME.cdo_direct
: 'RUN: at line 11';   cmp $BASENAME.cdo_direct/aie_cdo_elfs.bin $BASENAME.prj/aie_cdo_elfs.bin
: 'RUN: at line 12';   cmp $BASENAME.cdo_direct/aie_cdo_enable.bin $BASENAME.prj/aie_cdo_enable.bin
: 'RUN: at line 13';   cmp $BASENAME.cdo_direct/aie_cdo_error_handling.bin $BASENAME.prj/aie_cdo_error_handling.bin
: 'RUN: at line 14';   cmp $BASENAME.cdo_direct/aie_cdo_init.bin $BASENAME.prj/aie_cdo_init.bin
--
Exit Code: 1

Command Output (stdout):
--
~/mlir-aie/build_release/test/Targets/AIEGenerateCDODirect/07_shim_dma_core_function_with_loop.mlir.prj ~/mlir-aie/build_release/test/Targets/AIEGenerateCDODirect
Generating: /home/bouazim/mlir-aie/build_release/test/Targets/AIEGenerateCDODirect/07_shim_dma_core_function_with_loop.mlir.prj/aie_cdo_error_handling.bin
Generating: /home/bouazim/mlir-aie/build_release/test/Targets/AIEGenerateCDODirect/07_shim_dma_core_function_with_loop.mlir.prj/aie_cdo_elfs.bin
 AIE Compilation:   100% 0:00:00 0:00:00 2/2 4 Workers

--
Command Output (stderr):
--
[AIE ERROR] XAie_LoadElfPartial():600: Unable to open elf file, 2: No such file or directory
ERROR: Failed to load elf for core(%d,%d)
Error encountered while running: /home/bouazim/mlir-aie/build_release/test/Targets/AIEGenerateCDODirect/07_shim_dma_core_function_with_loop.mlir.prj/cdo_main.out --work-dir-path /home/bouazim/mlir-aie/build_release/test/Targets/AIEGenerateCDODirect/07_shim_dma_core_function_with_loop.mlir.prj/

--

Any thoughts about what dependencies that might have got lost on the way?

makslevental commented 9 months ago

There's not enough info in the report from check-aie to diagnose. Can you put ARGS "-vv --timeout 600" here. Also you can try rerunning cmake but with AIE_ENABLE_GENERATE_CDO_DIRECT=OFF.

medbzkst commented 9 months ago

I see that a big part of what is happening is that clang is not finding the includes and libs of aarch64 under $MLIR-AIE/build/runtime_lib. Only x86_64 is there, whereas aarch64 is the one that is required with the tutorials. Given that I followed the Build and Test workflow (Release), isn't it expected to get that compiled?

makslevental commented 9 months ago

I see that a big part of what is happening is that clang is not finding the includes and libs of aarch64 under $MLIR-AIE/build/runtime_lib. Only x86_64 is there, whereas aarch64 is the one that is required with the tutorials. Given that I followed the Build and Test workflow (Release), isn't it expected to get that compiled?

The check-tutorials and check-aie issue are distinct. Unfortunately I am not familiar with the cross-compilation paths (and thus the tutorials) but indeed they are not tested in CI (despite appearances) since the runners don't have vitis etc on them.

medbzkst commented 9 months ago

I understand. I kind of solved the issue here (even though I have some failing tests when check-aie-ing it).

Long story short, compiling by mimicking the CI is not the best option. But the CI helps tremendously prepare the compilation tools that would get hours lost (especially by moving to g++-11 and getting the exact and right MLIR-distro). However, after setting those up, the best option to compile is by using ./utils/build-mlir-aie.sh and by targeting either aarch64 or x86_64 here.

For reference, by keeping it aarch64 as is, the problem has been solved. I ,however, changed it to x86_64 to have it executed on my machine and because I am targeting VCK5000 and not VCK190 anyway.