Xilinx / mlir-aie

An MLIR-based toolchain for AMD AI Engine-enabled devices.
Other
288 stars 82 forks source link

IRON programming example: matrix multiply ref design results are wrong #1589

Open hecmay opened 3 months ago

hecmay commented 3 months ago

Hi,

I am following ASPLOS tutorial on a Minisforum UM790 Pro machine with AMD Ryzen NPU. I was able to successfully set up the linux environment, IPU driver, and all Vitis dependencies.

However, when I tried the MM reference example, the results seem to be wrong. Here follows the output from single_core version. The whole_array version's output does not match with ref design either. Only matrix vector version worked.

rm -rf _build
mkdir -p _build
cd _build &&  cmake -E env CXXFLAGS="-std=c++23 -ggdb" cmake /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/.. -D CMAKE_C_COMPILER=gcc-13 -D CMAKE_CXX_COMPILER=g++-13 -DTARGET_NAME=matrixMultiplication -Dsubdir=single_core
CMake Deprecation Warning at CMakeLists.txt:14 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.

-- The C compiler identification is GNU 13.1.0
-- The CXX compiler identification is GNU 13.1.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc-13 - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++-13 - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.74.0/BoostConfig.cmake (found version "1.74.0")  
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build
cd _build &&  cmake --build . --config Release
gmake[1]: Entering directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
gmake[2]: Entering directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
gmake[3]: Entering directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
gmake[3]: Leaving directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
gmake[3]: Entering directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
[ 33%] Building CXX object CMakeFiles/matrixMultiplication.dir/home/user/mlir-aie-test/runtime_lib/test_lib/test_utils.cpp.o
[ 66%] Building CXX object CMakeFiles/matrixMultiplication.dir/single_core/test.cpp.o
[100%] Linking CXX executable matrixMultiplication
gmake[3]: Leaving directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
[100%] Built target matrixMultiplication
gmake[2]: Leaving directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
gmake[1]: Leaving directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
cp _build/matrixMultiplication matrixMultiplication.exe 
mkdir -p build
python3 /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/aie2.py -m 64 -k 64 -n 64 -M 256 -K 256 -N 256 > build/aie_256x256x256_64x64x64.mlir
mkdir -p build
cd build && xchesscc_wrapper aie2 -I /tools/Xilinx/Vitis/2023.2/aietools/include  -DBIT_WIDTH=8 -DDIM_M=64 -DDIM_K=64 -DDIM_N=64 -c /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/../../../../aie_kernels/aie2/mm.cc -o mm_64x64x64.o
/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/../../../../aie_kernels/aie2/mm.cc:11:9: warning: '__AIENGINE__' macro redefined [-Wmacro-redefined]
#define __AIENGINE__ 2
        ^
<command line>:3:9: note: previous definition is here
#define __AIENGINE__ 1
        ^
In file included from /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/../../../../aie_kernels/aie2/mm.cc:23:
In file included from /tools/Xilinx/Vitis/2023.2/aietools/include/aie_api/aie.hpp:10185:
In file included from /tools/Xilinx/Vitis/2023.2/aietools/include/aie_api/aie_adf.hpp:75:
In file included from /tools/Xilinx/Vitis/2023.2/aietools/include/aie_api/adf/stream.hpp:54:
In file included from /tools/Xilinx/Vitis/2023.2/aietools/include/adf.h:7:
/tools/Xilinx/Vitis/2023.2/aietools/include/adf/intrinsics.h:28:9: warning: 'REL_WRITE' macro redefined [-Wmacro-redefined]
#define REL_WRITE -1
        ^
/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/../../../../aie_kernels/aie2/mm.cc:20:9: note: previous definition is here
#define REL_WRITE 0
        ^
2 warnings generated.
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <344>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <346>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <348>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <350>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <344>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <346>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <348>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <350>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): loop found to have 4 iterations, fewer than the explicitly annotated minimum 8 [-Wincorrect-annotation]
Warning: : (loop #8)
        Non leaf loop was prepared for pipelining. But the pipelined solutions have not been selected.
        Consider removing the chess_prepare_for_pipelining directive as it may improve results
mkdir -p build
cd build && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=final_256x256x256_64x64x64.xclbin \
            --aie-generate-npu --npu-insts-name=insts_256x256x256_64x64x64.txt ../build/aie_256x256x256_64x64x64.mlir
warning: overriding the module target triple with pdarch-unknown-unknown-elf [-Woverride-module]
1 warning generated.
Warning in "": (imprecise line-number, the error occurred somewhere in this function): loop with essential overflow in loop count computation (number of iterations exceeds internal maximum) [-Wloop-count-overflow]

****** Bootgen v2024.1
  **** Build date : Jun 18 2024-22:04:45
    ** Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
    ** Copyright 2022-2024 Advanced Micro Devices, Inc. All Rights Reserved.

[INFO]   : Bootimage generated successfully

XRT Build Version: 2.18.0 (HEAD)
       Build Date: 2024-07-01 14:52:40
          Hash ID: 73fe5440974fc51ccaba6366094e4bfa8151f79a
Creating a default 'in-memory' xclbin image.

Section: 'MEM_TOPOLOGY'(6) was successfully added.
Size   : 88 bytes
Format : JSON
File   : '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/build/aie_256x256x256_64x64x64.mlir.prj/mem_topology.json'

Section: 'AIE_PARTITION'(32) was successfully added.
Size   : 12560 bytes
Format : JSON
File   : '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/build/aie_256x256x256_64x64x64.mlir.prj/aie_partition.json'
Info: Embedded Metadata section is missing project.platform.device.core element, adding it.
Successfully wrote (18589 bytes) to the output file: final_256x256x256_64x64x64.xclbin
Leaving xclbinutil.
 AIE Compilation: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:01 2/2 4 Workers
Generating: /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/build/aie_256x256x256_64x64x64.mlir.prj/aie_cdo_elfs.bin
Generating: /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/build/aie_256x256x256_64x64x64.mlir.prj/aie_cdo_init.bin
Generating: /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/build/aie_256x256x256_64x64x64.mlir.prj/aie_cdo_enable.bin
export XRT_HACK_UNSECURE_LOADING_XCLBIN=1 && \
 ./matrixMultiplication.exe -x build/final_256x256x256_64x64x64.xclbin -i build/insts_256x256x256_64x64x64.txt -k MLIR_AIE -M 256 -K 256 -N 256 -v 2 --warmup 1 --iters 1
Matrix size 256x256x256
Sequence instr count: 278
Loading xclbin: build/final_256x256x256_64x64x64.xclbin
Kernel opcode: MLIR_AIE
Name: MLIR_AIE
Registering xclbin: build/final_256x256x256_64x64x64.xclbin
Getting hardware context.
Getting handle to kernel:MLIR_AIE
Writing data into buffer objects.
A = 
    3.53      3.33      0.56      1.59      0.50   ...     0.59      1.92      1.36      1.98      2.33  
    3.55      3.38      3.45      3.70      2.58   ...     0.88      0.92      1.83      1.84      0.10  
    3.81      1.03      2.38      0.70      2.00   ...     3.34      3.78      3.06      3.69      2.83  
    0.45      3.11      0.21      0.88      1.32   ...     2.92      0.66      0.50      0.61      0.12  
    1.48      3.42      0.21      3.56      2.23   ...     2.12      2.22      1.54      1.88      3.33  
    ...       ...       ...       ...       ...    ...     ...       ...       ...       ...       ...   
    3.72      2.95      0.41      0.58      1.48   ...     2.20      2.34      3.14      0.74      2.08  
    3.75      1.64      2.86      0.95      3.70   ...     3.39      2.27      1.22      1.20      1.34  
    3.95      0.57      1.95      0.05      0.11   ...     0.74      0.05      2.12      1.55      3.98  
    2.80      2.81      1.49      1.01      1.28   ...     0.57      0.34      3.39      2.62      2.75  
    2.50      1.41      0.57      3.41      1.20   ...     3.52      3.27      0.61      3.11      3.42  
B = 
    3.75      1.54      1.69      1.72      3.64   ...     3.06      1.29      1.19      3.28      1.34  
    1.14      1.06      0.57      0.12      2.27   ...     3.28      3.94      3.89      3.25      1.56  
    0.07      3.58      2.45      3.48      3.94   ...     1.16      1.06      2.17      2.17      3.30  
    2.31      0.21      3.45      2.09      2.47   ...     0.78      3.30      2.53      0.82      3.20  
    3.56      1.68      2.25      2.16      2.59   ...     3.12      3.94      2.97      3.23      2.06  
    ...       ...       ...       ...       ...    ...     ...       ...       ...       ...       ...   
    3.38      3.83      1.18      2.48      3.50   ...     1.02      3.92      1.11      3.06      1.11  
    1.73      1.41      1.59      3.47      1.08   ...     2.22      1.18      3.33      1.08      3.08  
    0.62      2.20      0.60      0.57      2.62   ...     2.36      3.34      1.81      0.15      1.60  
    0.96      0.79      3.92      0.95      1.16   ...     0.06      3.02      1.02      1.78      2.42  
    2.09      2.58      3.58      1.59      1.55   ...     1.66      3.44      3.34      1.09      3.56  
Running Kernel (iteration 0).
Running Kernel (iteration 1).
Verifying against reference matmul ...
[   64,     6] 1024.00 =!= 916.00
[   64,     7] 1032.00 =!= 912.00
[   64,    12] 1040.00 =!= 904.00
[   64,    19] 1088.00 =!= 956.00
[   64,    20] 1032.00 =!= 928.00
[   64,    23] 1128.00 =!= 1000.00
[   64,    24] 1128.00 =!= 996.00
[   64,    28] 1080.00 =!= 972.00
[   64,    30] 1112.00 =!= 984.00
[   64,    31] 1128.00 =!= 1008.00
[   64,    33] 1024.00 =!= 912.00
[   64,    35] 1056.00 =!= 940.00
[   64,    46] 1112.00 =!= 996.00
[   64,    47] 988.00 =!= 892.00
[   64,    49] 1088.00 =!= 980.00
[   64,    50] 1128.00 =!= 992.00
[   64,    54] 1136.00 =!= 1012.00
[   64,    56] 1048.00 =!= 940.00
[   64,    58] 1112.00 =!= 996.00
[   64,    64] 1048.00 =!= 936.00
[   64,    68] 1136.00 =!= 1016.00
[   64,    74] 1080.00 =!= 968.00
[   64,    79] 1024.00 =!= 900.00
[   64,    90] 1072.00 =!= 956.00
[   64,    92] 1048.00 =!= 948.00
[   64,    95] 1096.00 =!= 988.00
[   64,    97] 1048.00 =!= 944.00
[   64,    99] 1040.00 =!= 928.00
[   64,   133] 952.00 =!= 852.00
[   64,   156] 1120.00 =!= 1012.00
[   64,   163] 1072.00 =!= 944.00
[   64,   174] 1080.00 =!= 876.00
...and 3407 further errors.
Maximum relative error:  21%

Reference:
 1008.00   1016.00    984.00    980.00    956.00   ...   936.00   1080.00   1096.00   1064.00   1040.00  
 1024.00    948.00    944.00    948.00    904.00   ...   928.00   1048.00   1032.00   1072.00   1004.00  
  996.00    944.00    940.00    956.00    928.00   ...   872.00   1072.00   1024.00   1032.00   1000.00  
  940.00    912.00    880.00    912.00    900.00   ...   912.00   1032.00   1032.00   1020.00    956.00  
 1012.00    984.00    988.00    956.00    952.00   ...   936.00   1088.00   1064.00   1040.00   1056.00  
    ...       ...       ...       ...       ...    ...     ...       ...       ...       ...       ...   
  996.00    940.00    924.00    920.00    896.00   ...   924.00   1020.00    996.00   1032.00   1004.00  
 1056.00   1008.00    988.00   1016.00    996.00   ...   960.00   1104.00   1160.00   1080.00   1112.00  
  948.00    980.00    976.00    924.00    924.00   ...   896.00   1040.00   1016.00   1040.00   1056.00  
 1088.00   1016.00    996.00   1020.00    976.00   ...   960.00   1088.00   1104.00   1120.00   1080.00  
 1048.00   1020.00   1020.00   1048.00   1024.00   ...   952.00   1152.00   1112.00   1112.00   1112.00  

Output:
 1004.00   1008.00    980.00    972.00    948.00   ...   932.00   1072.00   1080.00   1056.00   1040.00  
 1016.00    944.00    936.00    944.00    896.00   ...   924.00   1040.00   1020.00   1064.00   1000.00  
  988.00    940.00    928.00    952.00    920.00   ...   864.00   1056.00   1020.00   1024.00    996.00  
  936.00    908.00    872.00    904.00    892.00   ...   904.00   1024.00   1024.00   1012.00    952.00  
 1004.00    976.00    980.00    952.00    944.00   ...   928.00   1080.00   1056.00   1032.00   1048.00  
    ...       ...       ...       ...       ...    ...     ...       ...       ...       ...       ...   
 1000.00    952.00    916.00    976.00    984.00   ...   900.00   1056.00   1032.00   1040.00   1056.00  
  928.00    908.00    936.00    916.00    920.00   ...   872.00   1032.00    988.00    996.00    996.00  
  988.00    968.00   1020.00    996.00    968.00   ...   924.00   1072.00   1048.00   1072.00   1024.00  
 1056.00   1032.00   1032.00    988.00   1000.00   ...   996.00   1096.00   1104.00   1112.00   1088.00  
 1056.00    996.00    940.00    988.00    980.00   ...   948.00   1072.00   1096.00   1128.00   1056.00  
Verify time: 0.00 s.

Avg NPU matmul time: 834.00us.
Avg NPU gflops: 40.23

Min NPU matmul time: 834.00us.
Max NPU gflops: 40.23

Max NPU matmul time: 834.00us.
Min NPU gflops: 40.23

Error count: 3439

Failed.

make: *** [/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/../makefile-common:87: run] Error 1
makslevental commented 3 months ago

related https://github.com/Xilinx/mlir-aie/issues/1554