cornell-zhang / hcl-dialect

HeteroCL-MLIR dialect for accelerator design
https://cornell-zhang.github.io/heterocl/index.html
Other
37 stars 15 forks source link

[Op] Intra-kernel .to() legality check using Polymer/PoCC in MLIR #188

Closed hecmay closed 1 year ago

hecmay commented 1 year ago

Summary

In this PR, I implemented the following features. NB: The names of these operators are subject to changes. We can choose better names later.

// Unfolding the target loop axis into an PE array of %factor PEs
%pe_array = hcl.unfold(%loop_handle, %factor)
// Broadcast weight %w to all three PEs in the unfolded PE array
%pe0_w = hcl.to(%w,     %pe_array) { "pe_index" = [0,1,2] }

Implementation details

Example

Here is a quick example to build a weight-stationary systolic array using hcl.unfold() and hcl.intra_kernel_to():

module {
    func.func @conv1d(%A: memref<64xf32>, %W: memref<3xf32>, %C: memref<61xf32>)
    {
        %s = hcl.create_op_handle "s"
        %li = hcl.create_loop_handle %s, "i"
        %lj = hcl.create_loop_handle %s, "j"

        // Polymer (PoCC) post-procssed loop nest
        affine.for %i = 0 to 61 {
            affine.for %j = 0 to 3 {
                func.call @S0(%i, %j, %C, %j, %A, %W) : (index, index, memref<61xf32>, index, memref<64xf32>, memref<3xf32>) -> ()
            // CHECK:  } {dep_distance = 0 : i64, loop_name = "i", op_name = "s"} 
            } { loop_name = "j", dep_distance = 1 }
        } { loop_name = "i", op_name = "s", dep_distance = 0 }

        %pe_array = hcl.unfold( %lj, 3 ) 
        hcl.to(%W : memref<3xf32>, %pe_array) { pe_index = [0,1,2] } -> memref<1xf32>
        %pe0_w = hcl.to(%W: memref<3xf32>, %pe_array) { pe_index = [0] } -> memref<1xf32>
        %pe1_w = hcl.to(%pe0_w: memref<1xf32>, %pe_array) { pe_index = [1] } -> memref<1xf32>
        return
    }
}
hecmay commented 1 year ago

Thanks for the PR! Can you also provide the generated MLIR code in the introduction? so that we can know better what kind of transformations you have done.

Right now, these ops do not transform the code; Specifically, when each .to() is applied, the compiler only checks whether the .to() is legal or not (based on the information from Polymer, which is saved as attr in the target loop).

My plan is to do the transformation lazily, because certain .to() may violate the assumption, so it is better to let compiler do the transformation in the very end after each .to() is verified. The compiler only performs the actual transformation (i.e., explicit PE instantiation, FIFO connection) after the legality if verified.

hecmay commented 1 year ago

@chhzh123 I renamed the PR to "legality check using Polymer/PoCC" as the actual transformation part is not included. I will open another PR to add the actual transformation.

chhzh123 commented 1 year ago

Got it. So what will .unfold do in this case? I only see dep_distance = 0 in the attribute, which is supposed to be generated by .to()? Then how is the information of unfold preserved?

hecmay commented 1 year ago

I just updated the test case. The unfold() op will insert the desired factor into the target loop. And each.to() op will check if the fine-grained data movement is legal or not based on the criteria mentioned in AutoSA paper.

chhzh123 commented 1 year ago

Also, can you rebase the commits? We just upgraded the LLVM version. It seems this PR still uses the old version of LLVM for testing, so I'm not sure whether it will cause errors after merging. We have a prebuilt LLVM 18.x project under the /work/shared/users/common/llvm-project-18.x folder, or you can use my docker image chhzh123/llvm-project for testing.

hecmay commented 1 year ago

@chhzh123 Sure. will do that. I have no other changes for this one.

hecmay commented 1 year ago

It seems like that something in tblgen is not working as expected after rebasing. I am trying to debug

/scratch/users/sx233/hcl-dialect/lib/Transforms/LoopTransformations.cpp: In function ‘mlir::LogicalResult mlir::hcl::runIntraKernelOpCheck(mlir::func::FuncOp&, mlir::hcl::IntraKernelToOp&)’:
/scratch/users/sx233/hcl-dialect/lib/Transforms/LoopTransformations.cpp:455:44: error: ‘class mlir::hcl::IntraKernelToOp’ has no member named ‘pe_array’
  455 |       dyn_cast<CreateLoopHandleOp>(intraOp.pe_array().getDefiningOp());
      |                                            ^~~~~~~~
/scratch/users/sx233/hcl-dialect/lib/Transforms/LoopTransformations.cpp: In function ‘mlir::LogicalResult mlir::hcl::runUnfolding(mlir::func::FuncOp&, mlir::hcl::UnfoldOp&)’:
/scratch/users/sx233/hcl-dialect/lib/Transforms/LoopTransformations.cpp:502:35: error: ‘class mlir::hcl::UnfoldOp’ has no member named ‘factor’
  502 |   auto optional_factor = unfoldOp.factor();
chhzh123 commented 1 year ago

@hecmay Yeah, the MLIR API changed. You need to use getPeArray and getFactor to retrieve the attributes