Closed rachitnigam closed 2 years ago
@cgyurgyik thoughts on this? Should we try to do this our give up?
So for SCF to Calyx, I’ve managed to write a small program that does a memcpy in SCF this weekend, and simulate properly with Calyx (with a few changes to the CIRCT side). However, anything even slightly more complex (e.g. multiple operations on the loaded memory) has resulted in errors because of the CIRCT side, e.g. double assignments to an address port. I imagine most initial debugging will just be looking at the emitted code from the SCFToCalyx pass, and seeing what's wrong with it.
That brings into the next question: what programs are we looking to simulate?
As far as “how much work”, it’ll depend on what we want to accomplish, and I’m also awful at answering this question :-).
That brings into the next question: what programs are we looking to simulate?
I think a baseline would be matrix multiply.
Thanks for doing the investigating. Maybe we can sync Tues/Wed and figure out if it's worth investing more time into this or not.
Currently working on getting a naive matrix multiply to compile. (So far), there's a bug with memory inlining: https://github.com/llvm/circt/issues/2108 . The fix shouldn't be too bad. I'll also need to add support for multi-cycle primitives (for a MulOp primitive).
#!/bin/bash
# arg $1 should be the path to the .mlir file you want to lower.
# arg $2 should be the path to the data you want for externalized memories.
# Path to LLVM-CIRCT binary.
CIRCT_BINARY= </your/path/to/circt/build/bin> # e.g. /Users/cgyurgyik/Projects/circt/build/bin
# Lower from SCF, and then emit native compiler Calyx.
$CIRCT_BINARY/circt-opt --lower-scf-to-calyx -canonicalize $1 |
$CIRCT_BINARY/circt-translate --export-calyx > lowered.futil
# Lower to Verilog.
fud e lowered.futil --to dat -s verilog.data $2
module {
func @main() {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c4 = arith.constant 4 : index
%lhs = memref.alloc() : memref<4x4xi32>
%rhs = memref.alloc() : memref<4x4xi32>
%res = memref.alloc() : memref<4x4xi32>
scf.while(%i = %c0) : (index) -> (index) {
%cond = arith.cmpi ult, %i, %c4 : index
scf.condition(%cond) %i : index
} do {
^bb0(%i: index):
scf.while(%j = %c0) : (index) -> (index) {
%cond = arith.cmpi ult, %j, %c4 : index
scf.condition(%cond) %j : index
} do {
^bb1(%j: index):
scf.while(%k = %c0) : (index) -> (index) {
%cond = arith.cmpi ult, %k, %c4 : index
scf.condition(%cond) %k : index
} do {
^bb2(%k: index):
%load_lhs = memref.load %lhs[%i, %k] : memref<4x4xi32> // lhs[i][k]
%load_rhs = memref.load %rhs[%k, %j] : memref<4x4xi32> // rhs[k][j]
%load_res = memref.load %res[%i, %j] : memref<4x4xi32> // res[i][j]
%mul = arith.muli %load_lhs, %load_rhs : i32 // mul = lhs[i][k] * rhs[k][j]
%sum = arith.addi %load_res, %mul : i32 // sum = res[i][j] + mul
memref.store %sum, %res[%i, %j] : memref<4x4xi32> // res[i][j] = sum
%incr = arith.addi %k, %c1 : index
scf.yield %incr : index
}
%incr = arith.addi %j, %c1 : index
scf.yield %incr : index
}
%incr = arith.addi %i, %c1 : index
scf.yield %incr : index
}
return
}
}
{
"mem_0": {
"data": [[1,2,3,4], [1,2,3,4], [1,2,3,4], [1,2,3,4]],
"format": {
"numeric_type": "bitnum",
"is_signed": false,
"width": 32
}
},
"mem_1": {
"data": [[1,2,3,4], [1,2,3,4], [1,2,3,4], [1,2,3,4]],
"format": {
"numeric_type": "bitnum",
"is_signed": false,
"width": 32
}
},
"mem_2": {
"data": [[0,0,0,0],[0,0,0,0],[0,0,0,0],[0,0,0,0]],
"format": {
"numeric_type": "bitnum",
"is_signed": false,
"width": 32
}
}
}
The very basic matrix multiply (will work after https://github.com/llvm/circt/pull/2139 and https://github.com/llvm/circt/pull/2137 are merged).
Nice job!!
I would've thought the equivalent fud
command would work as well since we have a stage to convert to and from MLIR and Calyx?
fud
allows for native compiler -> MLIR
. I imagine using CIRCT binaries would require registering an external stage?
There's nothing special about external stages. Take a look at fud/icarus.py
which registers an external FutilStage
for generating the right kind of verilog.
Huh? I was just suggesting an external stage since CIRCT is not necessarily a core piece of the Calyx infra.
Oh sure, that's reasonable. It used to be the case a month ago that you couldn't do the same things in an external stage that you can in an internal stage but that's not true anymore.
Ah ok, thanks for explaining.
On Thu, Nov 11, 2021 at 13:06 Rachit Nigam @.***> wrote:
Oh sure, that's reasonable. It used to be the case a month ago that you couldn't do the same things in an external stage that you can in an internal stage but that's not true anymore.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cucapra/calyx/issues/759#issuecomment-966625417, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBZMH4TCPXSGJSDDTVTQZ3ULQV6ZANCNFSM5G6BHB5A .
@cgyurgyik we should sync up on this if you're still interested in making things work with the SCF frontend.
For the time being, I need to focus on some work-related problems. In regards to the SCF frontend, I think the next steps will be:
par
into the CalyxToSCF lowering pass.After talking to @hanchenye and @stephenneuendorffer, it seems that the ScaleHLS flow (https://github.com/hanchenye/scalehls) can drive work on this. In a nutshell, ScaleHLS can compile down to affine
dialect programs which can be lowered to scf
programs. Long-term, it'd be awesome to connect the ScaleHLS flow with SCFToCalyx to enable end-to-end RTL generation.
With @mikeurbach's recent work on getting pipelines simulating through Calyx, we can call this done!
@cgyurgyik a next task to work on will be getting large-ish programs from the SCF dialect. I think it’ll make a good case study especially if you use solely the calyx debugger to find issues in the generated code.