Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Incorrect loop fusion in case of a producer with self-dependence #48328

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR49359
Status NEW
Importance P normal
Reported by Vinayaka Bandishti (vinayaka@polymagelabs.com)
Reported on 2021-02-25 21:40:59 -0800
Last modified on 2021-05-05 23:10:07 -0700
Version unspecified
Hardware PC Linux
CC joker.eph@gmail.com, sumesh.uk@gmail.com, uday@polymagelabs.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

Incorrect affine loop fusion occurs in case of a producer that has a self-dependence. As consumer only consumes a part of the memref produced, producer(source) cannot be deleted after the fusion.

Example below:

func @trial(%producer : memref<32xf32>, %consumer: memref<16xf32>){
  %cst = constant 2.000000e+00 : f32
  affine.for %arg3 = 0 to 32 {
    %0 = affine.load %producer[%arg3] : memref<32xf32>
    %2 = mulf %0, %cst : f32
    affine.store %2, %producer[%arg3] : memref<32xf32>
  }
  affine.for %arg3 = 0 to 16 {
    %0 = affine.load %producer[%arg3] : memref<32xf32>
    %2 = addf %0, %cst : f32
    affine.store %2, %consumer[%arg3] : memref<16xf32>
  }
  return
}

-affine-loop-fusion results in the following result:

module  {
  func @trial(%arg0: memref<16xf32>, %arg1: memref<32xf32>) {
    %cst = constant 2.000000e+00 : f32
    affine.for %arg2 = 0 to 32 {
      %0 = affine.load %arg1[%arg2] : memref<32xf32>
      %1 = mulf %0, %cst : f32
      affine.store %1, %arg1[%arg2] : memref<32xf32>
    }
    affine.for %arg2 = 0 to 16 {
      %0 = affine.load %arg1[%arg2] : memref<32xf32>
      %1 = mulf %0, %cst : f32
      affine.store %1, %arg1[%arg2] : memref<32xf32>
      %2 = affine.load %arg1[%arg2] : memref<32xf32>
      %3 = addf %2, %cst : f32
      affine.store %3, %arg0[%arg2] : memref<16xf32>
    }
    return
  }
}
Quuxplusone commented 3 years ago

Order of %arg0 and %arg1 are swapped in the output, by my mistake while pasting.

Actual output is:

module  {
  func @trial(%arg0: memref<32xf32>, %arg1: memref<16xf32>) {
    %cst = constant 2.000000e+00 : f32
    affine.for %arg2 = 0 to 32 {
      %0 = affine.load %arg0[%arg2] : memref<32xf32>
      %1 = mulf %0, %cst : f32
      affine.store %1, %arg0[%arg2] : memref<32xf32>
    }
    affine.for %arg2 = 0 to 16 {
      %0 = affine.load %arg0[%arg2] : memref<32xf32>
      %1 = mulf %0, %cst : f32
      affine.store %1, %arg0[%arg2] : memref<32xf32>
      %2 = affine.load %arg0[%arg2] : memref<32xf32>
      %3 = addf %2, %cst : f32
      affine.store %3, %arg1[%arg2] : memref<16xf32>
    }
    return
  }
}
Quuxplusone commented 3 years ago
Similar issue seen with the following test

func @fun(%arg0 : memref<100xf32, 1>, %arg1 : memref<100xf32, 1>) {
  affine.for %d = 0 to 100 {
    %x = affine.load %arg0[%d] : memref<100xf32, 1>
    %xx = addf %x, %x : f32
    affine.store %xx, %arg0[%d] : memref<100xf32, 1>
  }
  affine.for %d = 0 to 95 {
    affine.for %k = 0 to 5 {
      %arg0Vector = affine.load %arg0[%d + %k] : memref<100xf32, 1>
      affine.store %arg0Vector, %arg1[%d] : memref<100xf32, 1>
    }
  }
  return
}

The source is not removed after the first fusion due to limitation in how the
overlap is determined and it so gets fused again

#map = affine_map<(d0, d1) -> (d0 + d1)>
module  {
  func @fun(%arg0: memref<100xf32, 1>, %arg1: memref<100xf32, 1>) {
    affine.for %arg2 = 0 to 95 {
      affine.for %arg3 = 0 to 5 {
        %0 = affine.apply #map(%arg2, %arg3)
        %1 = affine.load %arg0[%0] : memref<100xf32, 1>
        %2 = addf %1, %1 : f32
        affine.store %2, %arg0[%0] : memref<100xf32, 1>
        %3 = affine.apply #map(%arg2, %arg3)
        %4 = affine.load %arg0[%3] : memref<100xf32, 1>
        %5 = addf %4, %4 : f32
        affine.store %5, %arg0[%3] : memref<100xf32, 1>
        %6 = affine.load %arg0[%arg2 + %arg3] : memref<100xf32, 1>
        affine.store %6, %arg1[%arg2] : memref<100xf32, 1>
      }
    }
    return
  }
}