AIRSpecializeChannelWrapAndStride: Hoist loop-invariant pure ops for dominance

Below shows the standard case that the pass has been supporting, where the affine.apply can get folded into the scf.for loop bounds:

    scf.for %arg4 = %c0 to %c8 step %c1 {
      %0 = affine.apply #map()[%arg4]
      %1 = air.channel.put async @channel_21[] (%arg0[%0] [%c8] [%c1]) : (memref<128x256xi32>)
    }

However, user-provided code could come in the following form too:

    scf.for %arg4 = %c0 to %c8 step %c1 {
      %0 = affine.apply #map()[%arg1]
      %1 = air.channel.put async @channel_21[] (%arg0[%0] [%c8] [%c1]) : (memref<128x256xi32>)
    }

Where the affine.apply cannot fold into the loop due to %arg1 not being the loop induction variable. In such case, the pass should hoist the affine.apply out of the for loop in order to fix value domination, since affine.apply is a pure op that doesn't touch memory and can therefore hoist safely.

Xilinx / mlir-air

AIRSpecializeChannelWrapAndStride: Hoist loop-invariant pure ops for dominance #772