Open andrej opened 3 months ago
Just updated my comment above with the command to compile the breaking example after changes in #1056.
I ran in to this again and dug a little deeper to get a minimal working example. See below. This should make it easier to debug instead of using the whole matrix multiplication design.
Summary
Error
/usr/include/c++/11/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = xilinx::AIE::BufferOp*; _Alloc = std::allocator<xilinx::AIE::BufferOp*>; std::vector<_Tp, _Alloc>::reference = xilinx::AIE::BufferOp*&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]:
Assertion '__n < this->size()' failed.
Aborted (core dumped)
Compilation Command
aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=bug.xclbin \
--aie-generate-npu --npu-insts-name=bug.txt bug.mlir
Code
The "unique" thing about this code is that we have a loop with only a single iteration. If we make it multiple iterations, the error does not happen. The error also does not happen when we only have two, not three, nested loops.
module {
aie.device(npu1_4col) {
%tile_0_1 = aie.tile(0, 1)
%tile_0_2 = aie.tile(0, 2)
aie.objectfifo @fifoA(%tile_0_2, {%tile_0_1}, 2 : i32) : !aie.objectfifo<memref<64x64xbf16>>
aie.objectfifo @fifoB(%tile_0_1, {%tile_0_2}, 2 : i32) : !aie.objectfifo<memref<64x64xbf16>>
%core_0_2 = aie.core(%tile_0_2) {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c4 = arith.constant 4 : index
%c4294967295 = arith.constant 4294967295 : index
scf.for %arg0 = %c0 to %c4294967295 step %c1 {
scf.for %arg1 = %c0 to %c1 step %c1 {
%0 = aie.objectfifo.acquire @fifoA(Produce, 1) : !aie.objectfifosubview<memref<64x64xbf16>>
%1 = aie.objectfifo.subview.access %0[0] : !aie.objectfifosubview<memref<64x64xbf16>> -> memref<64x64xbf16>
scf.for %arg2 = %c0 to %c4 step %c1 {
%2 = aie.objectfifo.acquire @fifoB(Consume, 1) : !aie.objectfifosubview<memref<64x64xbf16>>
%3 = aie.objectfifo.subview.access %2[0] : !aie.objectfifosubview<memref<64x64xbf16>> -> memref<64x64xbf16>
aie.objectfifo.release @fifoB(Consume, 1)
}
aie.objectfifo.release @fifoA(Produce, 1)
}
}
aie.end
}
}
}
Alternative error
If we remove the two aie.objectfifo.subview.access
statements, the error instead becomes:
/home/github/actions-runner/_work/mlir-aie/mlir-aie/mlir/src/python/MLIRPythonExtension.Core/IRModule.h:433:
mlir::python::PyMlirContext::ErrorCapture::~ErrorCapture(): Assertion `errors.empty() && "unhandled captured errors"' failed.
Aborted (core dumped)
Workaround
In the Python code that generates the MLIR, check if loops have a single iteration. If so, do not emit the loop.
cc @AndraBisca
After some more testing, this appears to affect not just loops with one iteration. For example, giving the middle loop nine iterations and the inner one four gives the same error, as follows:
module {
aie.device(npu1_4col) {
%tile_0_1 = aie.tile(0, 1)
%tile_0_2 = aie.tile(0, 2)
aie.objectfifo @fifoA(%tile_0_2, {%tile_0_1}, 2 : i32) : !aie.objectfifo<memref<64x64xbf16>>
aie.objectfifo @fifoB(%tile_0_1, {%tile_0_2}, 2 : i32) : !aie.objectfifo<memref<64x64xbf16>>
%core_0_2 = aie.core(%tile_0_2) {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c9 = arith.constant 9 : index
%c4 = arith.constant 4 : index
%c4294967295 = arith.constant 4294967295 : index
scf.for %arg0 = %c0 to %c4294967295 step %c1 {
scf.for %arg1 = %c0 to %c9 step %c1 { // <-
%0 = aie.objectfifo.acquire @fifoA(Produce, 1) : !aie.objectfifosubview<memref<64x64xbf16>>
%1 = aie.objectfifo.subview.access %0[0] : !aie.objectfifosubview<memref<64x64xbf16>> -> memref<64x64xbf16>
scf.for %arg2 = %c0 to %c4 step %c1 {
%2 = aie.objectfifo.acquire @fifoB(Consume, 1) : !aie.objectfifosubview<memref<64x64xbf16>>
%3 = aie.objectfifo.subview.access %2[0] : !aie.objectfifosubview<memref<64x64xbf16>> -> memref<64x64xbf16>
aie.objectfifo.release @fifoB(Consume, 1)
}
aie.objectfifo.release @fifoA(Produce, 1)
}
}
aie.end
}
}
}
I also noticed the ObjectFIFO depth has to be > 1 for the error to trigger. (I think for depth=1, the loops are not unrolled.)
This one should probably be assigned to Andra. It seems some recent changes to the ObjectFifo are causing an issue for me. The following compiled fine for me a couple weeks ago.
Try to build
reference_designs/ipu-xrt/matrix_multiplication_array
with the following command:The compiler then crashes during this step (when trying to
make
):With the following failed assertion:
Here is a partial stack trace identifying some object fifo code as the culprit:
Thanks in advance for looking into this!