Closed hunhoffe closed 6 hours ago
It should now be possible to also replicate the failure in the minimal-matrix-scalar-add
branch
I saw that the example was instantiating one air.channel
(not broadcasting) is trying to access 4 different aie cores at different time phases via a for-loop nest. This is not going to work for now because air.channel
lowers to circuit-switched routings using aie.flow
ops, which are static connections. This might be enabled in the future when in the future we lower to dynamically routed packet-switched routing.
For now, I can only see the test to work by instantiating four air.channels
(or a bundle of [2,2] air.channels), spatially parallelized via either hand unrolling or using a parent scf.parallel/forall
loop.
https://github.com/Xilinx/mlir-air/pull/637 also contains some minor fixes.
Thank you for the fix ups!
I will go ahead and rewrite the example in a way that uses additional channels (one of the ways you suggested).
I believe I have a version that works here: https://github.com/Xilinx/mlir-air/pull/641
Working version merged in!
From branch
debugging_matrix_scalar_add
, I am working on getting my examplemulti_core_channel
working (this file). The single core version works in this branch, but when I increase the herd size from 1x1 to 2x2, the example does not work any more.To be specific, to replicate run:
When I inspect
programming_examples/matrix_scalar_add/multi_core_channel/build/air_project/npu.air.mlir
, I notice that it looks like all of the data is only going to one core instead of being distributed to all of the cores.