Xilinx / mlir-air

MIT License
71 stars 26 forks source link

Worker-to-Self Channel Example #654

Open hunhoffe opened 1 week ago

hunhoffe commented 1 week ago

I am working on writing a worker-to-worker data transfer example for channels (as part of the grouping of examples that exercise various features of channels, https://github.com/Xilinx/mlir-air/issues/648).

Draft PR is here: https://github.com/Xilinx/mlir-air/pull/653

I am basing it off the code in the channel_size example (PR waiting to be merged here: https://github.com/Xilinx/mlir-air/pull/642)

The channel_size example works well for me. As an intermediate step to adding worker-to-worker communication to that example, I tried to have each worker send data to itself over a channel. That is the version of the code that is pushed in the draft PR https://github.com/Xilinx/mlir-air/pull/653. The particular file of interest is this one.

When I run with this intermediate step, I get the following error:

Using aiecc.py from:  /scratch/ehunhoff/mlir-air/mlir-aie/install/bin/..
Running: builtin.module(air-insert-launch-and-segment-around-herd,func.func(air-lower-herd-parallel),air-dma-to-channel,canonicalize,cse,air-specialize-channel-wrap-and-stride,func.func(air-renumber-dma),func.func(convert-linalg-to-loops),air-place-herds{num-rows=6 num-cols=4 row-anchor=2 col-anchor=0})
Running: builtin.module(air-to-aie{emit-while-loop=false row-offset=2 col-offset=0 device=npu1_4col})
python3: /scratch/ehunhoff/mlir-air/mlir/lib/Conversion/AIRToAIESchedulingUtils.cpp:956: void xilinx::air::simpleDMAChannelAllocation(std::vector<MemcpyBundleAsFlow> &, ShimDMAAllocator &, MemTileDMAAllocator &, TileDMAAllocator &): Assertion `core' failed.
Aborted (core dumped)
make: *** [Makefile:7: run] Error 134

My question is:

erwei-xilinx commented 1 week ago

Thanks for investigating into worker-to-worker communication scenario. This scenario hasn't been exercised, so a bug at air compiler is possible. The simpleDMAChannelAllocation is implemented with a naive logic to allocate DMA channels to data movements with endpoints of different memory space; in your case the endpoints are both L1, which might lead to unexpected behaviour.

erwei-xilinx commented 1 week ago

And this assumption could affect other methods and passes in air compiler.

hunhoffe commented 1 week ago

Ah, I see!

As a path forward, would it make sense to:

hunhoffe commented 1 week ago

An update on this plan:

I will leave this issue up until the l1-l1 worker-to-self example is supported (and the lit test passes) or it is decided this is definitely an illegal action, which can be caught by the verifier (hopefully with a helpful error).

hunhoffe commented 4 days ago

This is an issue that directly addresses accessing l2 allocations in the channel: https://github.com/Xilinx/mlir-air/issues/666

This PR takes that fix and attempts to fix the worker-to-self example using this new capability; however, it still fails with:

python3: mlir-air/mlir/lib/Conversion/AIRToAIESchedulingUtils.cpp:956: void xilinx::air::simpleDMAChannelAllocation(std::vector<MemcpyBundleAsFlow> &, ShimDMAAllocator &, MemTileDMAAllocator &, TileDMAAllocator &): Assertion `core' failed.
Aborted (core dumped)