Xilinx / mlir-air

MIT License
80 stars 27 forks source link

Channel Operations May Fail Depending on `DeallocOp()` Placement #660

Open hunhoffe opened 4 months ago

hunhoffe commented 4 months ago

This is part of my effort to write examples using channels in a variety of ways (https://github.com/Xilinx/mlir-air/issues/648).

I've been having some issues with getting worker-to-worker (core-to-core within a herd) data movement with channels to work. I managed to make it work, but the program is extremely brittle.

For instance, moving a deallocation sooner caused the program to work, where if I had my deallocations at the end, the output was all zeroes.

My code is found here: https://github.com/Xilinx/mlir-air/pull/697 The commit to move the dealloc ops that fixed the example is: https://github.com/Xilinx/mlir-air/pull/697/commits/17162b1be997330352cce8e09ccac8f23fe235a2

hunhoffe commented 3 months ago

I did a redesign here, but it still does not work: https://github.com/Xilinx/mlir-air/pull/697 The output data is all zeroes. However, the redesign definitely fixed a bug or two in the original design.

hunhoffe commented 3 months ago

I fixed the design, but I think the previous design should have been valid.