Open Quuxplusone opened 3 years ago
Order of %arg0
and %arg1
are swapped in the output, by my mistake while pasting.
Actual output is:
module {
func @trial(%arg0: memref<32xf32>, %arg1: memref<16xf32>) {
%cst = constant 2.000000e+00 : f32
affine.for %arg2 = 0 to 32 {
%0 = affine.load %arg0[%arg2] : memref<32xf32>
%1 = mulf %0, %cst : f32
affine.store %1, %arg0[%arg2] : memref<32xf32>
}
affine.for %arg2 = 0 to 16 {
%0 = affine.load %arg0[%arg2] : memref<32xf32>
%1 = mulf %0, %cst : f32
affine.store %1, %arg0[%arg2] : memref<32xf32>
%2 = affine.load %arg0[%arg2] : memref<32xf32>
%3 = addf %2, %cst : f32
affine.store %3, %arg1[%arg2] : memref<16xf32>
}
return
}
}
Similar issue seen with the following test
func @fun(%arg0 : memref<100xf32, 1>, %arg1 : memref<100xf32, 1>) {
affine.for %d = 0 to 100 {
%x = affine.load %arg0[%d] : memref<100xf32, 1>
%xx = addf %x, %x : f32
affine.store %xx, %arg0[%d] : memref<100xf32, 1>
}
affine.for %d = 0 to 95 {
affine.for %k = 0 to 5 {
%arg0Vector = affine.load %arg0[%d + %k] : memref<100xf32, 1>
affine.store %arg0Vector, %arg1[%d] : memref<100xf32, 1>
}
}
return
}
The source is not removed after the first fusion due to limitation in how the
overlap is determined and it so gets fused again
#map = affine_map<(d0, d1) -> (d0 + d1)>
module {
func @fun(%arg0: memref<100xf32, 1>, %arg1: memref<100xf32, 1>) {
affine.for %arg2 = 0 to 95 {
affine.for %arg3 = 0 to 5 {
%0 = affine.apply #map(%arg2, %arg3)
%1 = affine.load %arg0[%0] : memref<100xf32, 1>
%2 = addf %1, %1 : f32
affine.store %2, %arg0[%0] : memref<100xf32, 1>
%3 = affine.apply #map(%arg2, %arg3)
%4 = affine.load %arg0[%3] : memref<100xf32, 1>
%5 = addf %4, %4 : f32
affine.store %5, %arg0[%3] : memref<100xf32, 1>
%6 = affine.load %arg0[%arg2 + %arg3] : memref<100xf32, 1>
affine.store %6, %arg1[%arg2] : memref<100xf32, 1>
}
}
return
}
}
Incorrect affine loop fusion occurs in case of a producer that has a self-dependence. As consumer only consumes a part of the memref produced, producer(source) cannot be deleted after the fusion.
Example below:
-affine-loop-fusion
results in the following result: