calyxir / calyx

Intermediate Language (IL) for Hardware Accelerator Generators
https://calyxir.org
MIT License
453 stars 45 forks source link

Simulating SCF designs #759

Closed rachitnigam closed 2 years ago

rachitnigam commented 2 years ago

@cgyurgyik a next task to work on will be getting large-ish programs from the SCF dialect. I think it’ll make a good case study especially if you use solely the calyx debugger to find issues in the generated code.

rachitnigam commented 2 years ago

@cgyurgyik thoughts on this? Should we try to do this our give up?

cgyurgyik commented 2 years ago

So for SCF to Calyx, I’ve managed to write a small program that does a memcpy in SCF this weekend, and simulate properly with Calyx (with a few changes to the CIRCT side). However, anything even slightly more complex (e.g. multiple operations on the loaded memory) has resulted in errors because of the CIRCT side, e.g. double assignments to an address port. I imagine most initial debugging will just be looking at the emitted code from the SCFToCalyx pass, and seeing what's wrong with it.

That brings into the next question: what programs are we looking to simulate?

As far as “how much work”, it’ll depend on what we want to accomplish, and I’m also awful at answering this question :-).

rachitnigam commented 2 years ago

That brings into the next question: what programs are we looking to simulate?

I think a baseline would be matrix multiply.

Thanks for doing the investigating. Maybe we can sync Tues/Wed and figure out if it's worth investing more time into this or not.

cgyurgyik commented 2 years ago

Currently working on getting a naive matrix multiply to compile. (So far), there's a bug with memory inlining: https://github.com/llvm/circt/issues/2108 . The fix shouldn't be too bad. I'll also need to add support for multi-cycle primitives (for a MulOp primitive).

cgyurgyik commented 2 years ago
#!/bin/bash

# arg $1 should be the path to the .mlir file you want to lower.
# arg $2 should be the path to the data you want for externalized memories.

# Path to LLVM-CIRCT binary.
CIRCT_BINARY= </your/path/to/circt/build/bin> # e.g. /Users/cgyurgyik/Projects/circt/build/bin

# Lower from SCF, and then emit native compiler Calyx.
$CIRCT_BINARY/circt-opt --lower-scf-to-calyx -canonicalize $1 |
$CIRCT_BINARY/circt-translate --export-calyx > lowered.futil

# Lower to Verilog.
fud e lowered.futil --to dat -s verilog.data $2
module {
  func @main() {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c4 = arith.constant 4 : index
    %lhs = memref.alloc() : memref<4x4xi32>
    %rhs = memref.alloc() : memref<4x4xi32>
    %res = memref.alloc() : memref<4x4xi32>

    scf.while(%i = %c0) : (index) -> (index) {
      %cond = arith.cmpi ult, %i, %c4 : index
      scf.condition(%cond) %i : index
    } do {
    ^bb0(%i: index):
      scf.while(%j = %c0) : (index) -> (index) {
        %cond = arith.cmpi ult, %j, %c4 : index
        scf.condition(%cond) %j : index
      } do {
      ^bb1(%j: index):
        scf.while(%k = %c0) : (index) -> (index) {
          %cond = arith.cmpi ult, %k, %c4 : index
          scf.condition(%cond) %k : index
        } do {
        ^bb2(%k: index):
          %load_lhs = memref.load %lhs[%i, %k] : memref<4x4xi32> // lhs[i][k]
          %load_rhs = memref.load %rhs[%k, %j] : memref<4x4xi32> // rhs[k][j]
          %load_res = memref.load %res[%i, %j] : memref<4x4xi32> // res[i][j]
          %mul = arith.muli %load_lhs, %load_rhs : i32           // mul = lhs[i][k] * rhs[k][j]
          %sum = arith.addi %load_res, %mul : i32                // sum = res[i][j] + mul
          memref.store %sum, %res[%i, %j] : memref<4x4xi32>      // res[i][j] = sum
          %incr = arith.addi %k, %c1 : index
          scf.yield %incr : index
        }
        %incr = arith.addi %j, %c1 : index
        scf.yield %incr : index
      }
      %incr = arith.addi %i, %c1 : index
      scf.yield %incr : index
    }
    return
  }
}
{
  "mem_0": {
    "data": [[1,2,3,4], [1,2,3,4], [1,2,3,4], [1,2,3,4]],
    "format": {
      "numeric_type": "bitnum",
      "is_signed": false,
      "width": 32
    }
  },
  "mem_1": {
    "data": [[1,2,3,4], [1,2,3,4], [1,2,3,4], [1,2,3,4]],
    "format": {
      "numeric_type": "bitnum",
      "is_signed": false,
      "width": 32
    }
  },
  "mem_2": {
    "data": [[0,0,0,0],[0,0,0,0],[0,0,0,0],[0,0,0,0]],
    "format": {
      "numeric_type": "bitnum",
      "is_signed": false,
      "width": 32
    }
  }
}

The very basic matrix multiply (will work after https://github.com/llvm/circt/pull/2139 and https://github.com/llvm/circt/pull/2137 are merged).

rachitnigam commented 2 years ago

Nice job!!

I would've thought the equivalent fud command would work as well since we have a stage to convert to and from MLIR and Calyx?

cgyurgyik commented 2 years ago

fud allows for native compiler -> MLIR. I imagine using CIRCT binaries would require registering an external stage?

rachitnigam commented 2 years ago

There's nothing special about external stages. Take a look at fud/icarus.py which registers an external FutilStage for generating the right kind of verilog.

cgyurgyik commented 2 years ago

Huh? I was just suggesting an external stage since CIRCT is not necessarily a core piece of the Calyx infra.

rachitnigam commented 2 years ago

Oh sure, that's reasonable. It used to be the case a month ago that you couldn't do the same things in an external stage that you can in an internal stage but that's not true anymore.

cgyurgyik commented 2 years ago

Ah ok, thanks for explaining.

On Thu, Nov 11, 2021 at 13:06 Rachit Nigam @.***> wrote:

Oh sure, that's reasonable. It used to be the case a month ago that you couldn't do the same things in an external stage that you can in an internal stage but that's not true anymore.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cucapra/calyx/issues/759#issuecomment-966625417, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBZMH4TCPXSGJSDDTVTQZ3ULQV6ZANCNFSM5G6BHB5A .

rachitnigam commented 2 years ago

@cgyurgyik we should sync up on this if you're still interested in making things work with the SCF frontend.

cgyurgyik commented 2 years ago

For the time being, I need to focus on some work-related problems. In regards to the SCF frontend, I think the next steps will be:

  1. Add support for division / remainder in CalyxToSCF (https://github.com/llvm/circt/issues/2141)
  2. Support par into the CalyxToSCF lowering pass.
  3. Find some SCF dialect generators, and look at the codegen for common patterns that need to lowered.
rachitnigam commented 2 years ago

After talking to @hanchenye and @stephenneuendorffer, it seems that the ScaleHLS flow (https://github.com/hanchenye/scalehls) can drive work on this. In a nutshell, ScaleHLS can compile down to affine dialect programs which can be lowered to scf programs. Long-term, it'd be awesome to connect the ScaleHLS flow with SCFToCalyx to enable end-to-end RTL generation.

rachitnigam commented 2 years ago

With @mikeurbach's recent work on getting pipelines simulating through Calyx, we can call this done!