Xilinx / mlir-aie

An MLIR-based toolchain for AMD AI Engine-enabled devices.
Other
288 stars 82 forks source link

ObjectFifo - Assertion Error in Nested Loop with Only One Iteration #1547

Closed andrej closed 3 months ago

andrej commented 3 months ago

I'm just making this issue to document an error (and workaround, see below) I'm seeing with the ObjectFifo loop unrolling:

Summary

Error

/usr/include/c++/11/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = xilinx::AIE::BufferOp*; _Alloc = std::allocator<xilinx::AIE::BufferOp*>; std::vector<_Tp, _Alloc>::reference = xilinx::AIE::BufferOp*&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: 
Assertion '__n < this->size()' failed.
Aborted (core dumped)

Compilation Command

aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=bug.xclbin \
                         --aie-generate-npu --npu-insts-name=bug.txt bug.mlir

Code

The "unique" thing about this code is that we have a loop with only a single iteration. If we make it multiple iterations, the error does not happen. The error also does not happen when we only have two, not three, nested loops.

module {
  aie.device(npu1_4col) {

    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c4 = arith.constant 4 : index
    %c4294967295 = arith.constant 4294967295 : index

    %tile_0_1 = aie.tile(0, 1)
    %tile_0_2 = aie.tile(0, 2)

    aie.objectfifo @fifoA(%tile_0_2, {%tile_0_1}, 2 : i32) : !aie.objectfifo<memref<64x64xbf16>>
    aie.objectfifo @fifoB(%tile_0_1, {%tile_0_2}, 2 : i32) : !aie.objectfifo<memref<64x64xbf16>>

    %core_0_2 = aie.core(%tile_0_2) {

      scf.for %arg0 = %c0 to %c4294967295 step %c1 {
        scf.for %arg1 = %c0 to %c1 step %c1 {
          %0 = aie.objectfifo.acquire @fifoA(Produce, 1) : !aie.objectfifosubview<memref<64x64xbf16>>
          %1 = aie.objectfifo.subview.access %0[0] : !aie.objectfifosubview<memref<64x64xbf16>> -> memref<64x64xbf16>
          scf.for %arg2 = %c0 to %c4 step %c1 {
            %2 = aie.objectfifo.acquire @fifoB(Consume, 1) : !aie.objectfifosubview<memref<64x64xbf16>>
            %3 = aie.objectfifo.subview.access %2[0] : !aie.objectfifosubview<memref<64x64xbf16>> -> memref<64x64xbf16>
            aie.objectfifo.release @fifoB(Consume, 1)
          }
          aie.objectfifo.release @fifoA(Produce, 1)
        }
      }

      aie.end

    }
  }
}

Alternative error

If we remove the two aie.objectfifo.subview.access statements, the error instead becomes:

/home/github/actions-runner/_work/mlir-aie/mlir-aie/mlir/src/python/MLIRPythonExtension.Core/IRModule.h:433:
mlir::python::PyMlirContext::ErrorCapture::~ErrorCapture(): Assertion `errors.empty() && "unhandled captured errors"' failed.
Aborted (core dumped)

Workaround

In the Python code that generates the MLIR, check if loops have a single iteration. If so, do not emit the loop.

cc @AndraBisca

andrej commented 3 months ago

Just realized I already reported something very similar in #1128. This is probably the same issue. I will add the minimal example I came up with here as a comment to the other issue and close this.