calyxir / calyx

Intermediate Language (IL) for Hardware Accelerator Generators
https://calyxir.org
MIT License
460 stars 47 forks source link

Try out Halide to Calyx flow #1585

Open rachitnigam opened 1 year ago

rachitnigam commented 1 year ago

@xerpi has been working a Halide (to MLIR) to Calyx flow and we should give it a try and get a sense of what we need to do to support it.

Some information from on this (courtesy @xerpi):

Other refs:

xerpi commented 1 year ago

Some instructions to get started: 1) First, install the MLIR libraries: either compile LLVM with MLIR enabled (-DLLVM_ENABLE_PROJECTS="mlir"), or install it from your distro repositories (in Fedora it's the mlir-devel package).

2) Build Halide: https://github.com/halide/Halide#building-halide-with-cmake

-- Using LLVMConfig.cmake in: /usr/lib64/cmake/llvm
-- Using ClangConfig.cmake in: /usr/lib64/cmake/clang
-- Using MLIRConfig.cmake in: /usr/lib64/cmake/mlir
...

Check that MLIRConfig.cmake is found when running cmake.
In my case, I also had to pass -DTARGET_WEBASSEMBLY:BOOL=OFF to CMake otherwise it was complaining about shared LLVM libs.
I also recommend passing -DHalide_ENABLE_EXCEPTIONS:BOOL=OFF to get the errors printed.

3) Currently, generating MLIR code is done manually by calling Func::compile_to_mlir. So you have to edit Halide programs and change the realize() method (which compiles and runs on the host) with a call to compile_to_mlir

rachitnigam commented 1 year ago

Thanks for adding the information @xerpi!

rachitnigam commented 1 year ago

I think this is the dissertation that describes the flow (I think): https://upcommons.upc.edu/bitstream/handle/2117/390390/176860.pdf?sequence=2

xerpi commented 1 year ago

I think this is the dissertation that describes the flow (I think): https://upcommons.upc.edu/bitstream/handle/2117/390390/176860.pdf?sequence=2

Indeed, that was my thesis dissertation for the project. The "interesting" part starts in "Chapter 4 - Methodology".

Here's a summary of the flow:

High-level overview of the implemented flow from Halide down to execution on Xilinx FPGAs: image

Passes executed to convert from generic MLIR to CIRCT’s hardware dialects (after the MLIR to Calyx step, human-readable Calyx code can be emitted): image

Steps needed to export SystemVerilog targeting Xilinx devices from CIRCT with hardware dialects. First, the needed Xilinx-specific wrappers are added, and then the passes to convert to SystemVerilog are executed. A kernel.xml file needed by Vitis v++ is also generated: image

rachitnigam commented 12 months ago

Oh interesting! Did you ever use the native compiler to perform any optimizations to the design? I wonder how the resulting designs would differ.

xerpi commented 12 months ago

Oh interesting! Did you ever use the native compiler to perform any optimizations to the design? I wonder how the resulting designs would differ.

At the beginning of development, I indeed emitted human-readable Calyx code and used the native Calyx compiler to check that code produced was at least semantically correct. By that time, I still had to implement Halide's XRT runtime backend so I didn't do any performance comparisons (or even checked visually the difference of the RTL code emitted).

Soon after that, I added custom Calyx operations to support vector types and since then unfortunately Calyx code can't be emitted anymore.

rachitnigam commented 12 months ago

Ah got it! What are the custom vector operations that you ended up adding to Calyx. Maybe we can figure out a way to support them natively.

xerpi commented 12 months ago

Ah got it! What are the custom vector operations that you ended up adding to Calyx. Maybe we can figure out a way to support them natively.

To implement the Halide vector Broadcast (also called vector splat) operation, I added an equivalent node to Calyx: https://github.com/xerpi/circt/commit/53d5f7dcc115355797fc89b8c929706924ae943d#diff-257f12e7264e222a495e648498b969968a86ac197236df0433a65335da1509bf

I also had to remove the constraint that ensures that a Calyx assign src and dst wires are of the same type, so that I could flatten/unflatten vectors to just a range of bits and vice-versa (needed for example when reading from the memory bus into a vector, and vice versa): https://github.com/xerpi/circt/commit/53d5f7dcc115355797fc89b8c929706924ae943d#diff-129515b0cbde7eccbd6943c9b6f45d597ff0fcc25df9f45b4da60e804815e8c0

I also had to add support for sequential-read memories (extra "read-en" signal): https://github.com/llvm/circt/pull/4857

jiahanxie353 commented 2 months ago

Bringing back this discussion after almost a year :)

I'm working on vectors stuff and have relevant questions regarding vector operations.

To implement the Halide vector Broadcast (also called vector splat) operation, I added an equivalent node to Calyx: https://github.com/xerpi/circt/commit/53d5f7dcc115355797fc89b8c929706924ae943d#diff-257f12e7264e222a495e648498b969968a86ac197236df0433a65335da1509bf

@xerpi Did you get a chance to implement vector additions and having vectors as return types, especially that Calyx doesn't have vector registers?

xerpi commented 2 months ago

@jiahanxie353 Hi! I didn't get a chance to implement it properly... Iirc what I did what to cast a flat bit array of N bits to a vector with K lanes (N/K bits per lane) and viceversa, implicitly. The code was a horrible hack and I hope I had more time to implement something cleaner. I think you can find it here: https://github.com/xerpi/circt/commits/dev/xerpi/scf-to-calyx-vector-types/