calyxir / calyx

Intermediate Language (IL) for Hardware Accelerator Generators
https://calyxir.org
MIT License
483 stars 48 forks source link

Xilinx toolchain #876

Open sampsyo opened 2 years ago

sampsyo commented 2 years ago

Discussed in https://github.com/cucapra/calyx/discussions/873

Originally posted by **sampsyo** January 13, 2022 As a recreational project this winter, I poked around at our infrastructure for running programs for real on Xilinx FPGAs (which is all the incredible work of the inimitable @sgpthomas!!). I just wanted to tie together the issues I've been filing to summarize the current state of things, which might be especially relevant to @yn224. The bottom line is: compilation is working OK, with one significant asterisk; emulation is barely starting to work; and I have not tried real FPGA execution. * *Compilation:* As of #850, #851, #852, and #855, we are successfully producing `xclbin` files from Calyx programs. :tada: * [x] The big remaining problem is an issue involving multiple memories and multiple AXI interfaces, described in #853. * [x] The next step to be done here is to see if the `xo` generation Tcl needs different declarations of the AXI interfaces, as described in https://github.com/cucapra/calyx/issues/853#issuecomment-1006817950. * [ ] It would also be useful to continue trying to simplify [our Tcl script](https://github.com/cucapra/calyx/blob/master/fud/bitstream/gen_xo.tcl) to the absolute bare minimum necessary to produce an `xo` file (so we understand exactly what we're doing). * *Emulation:* You can see emulation working in the janky "tests" I made in #866. * [x] However, the one tiny test I am running is not producing the right answer. (The memory state seems to be unmodified from the initial state.) One next step is to debug this stuff. * [x] There is also some important refactoring to be done in #872. * [x] We are also missing docs for the `fpga` stage, which we should write after the refactoring. (We can delete the docs for the `emulation` stage.) * *Execution:* Basically, we need to try out real execution on actual hardware. It only really makes sense to focus on it after emulation works, but we could do some of the experimentation concurrently. This would also benefit from the refactoring in the aforementioned #872. Of course, the end result of all this should be that we can do `fud e something.fuse --to dat --through fpga` and everything just works (and the output matches our interpreter and Verilator execution). I also strongly believe we should maintain a focus on documenting things as thoroughly as we can possibly muster in [the appropriate chapter][xdocs]—this stuff is *so damned confusing* and under-documented that we really benefit from writing things down clearly and exhaustively along the way. Some fun future work after everything's nailed down for an MVP: * [ ] Let's do Intel!! I keep hearing good things about [OPAE][], which is the Intel equivalent of Xilinx's [XRT][]. * [ ] Maybe we can remove our bespoke statistics-only Vivado synthesis setup and replace it with this toolchain. Who knows [xdocs]: https://docs.calyxir.org/fud/xilinx.html [opae]: https://opae.github.io [xrt]: https://github.com/Xilinx/XRT
yn224 commented 2 years ago

[01/26] I have verified that allocating the same number of AXI interface with the number of external memory declarations on futil files solve the problem of xclbin files being generated. For instance, if the example file includes 3 external memories, then we can declare m0_axi, m1_axi, and m2_axi.

Some examples used include memory tutorial, modified memory tutorial (mem_tut_dup.txt) (where I basically duplicated the logic to have 2 different memory), and vectorized-add. I also tested with the case where there are more AXI declaration than the number of external memory declaration and that also seems to work fine.

sampsyo commented 2 years ago

@yn224, I'm moving discussion to #853, which is the issue about this specific problem.

rachitnigam commented 2 years ago

@sampsyo should we close this/re-evaluate once #1153 is merged and @nathanielnrn's work over the summer is complete?

sampsyo commented 2 years ago

Certainly time to re-evaluate, given all this progress! I checked off a few things—the stuff to be re-categorized (put on a roadmap somewhere, factored out into another issue, etc.) include trying to simplify the relevant Tcl script, future work on Intel, and removing the special statistics-only stages in fud.

Guru1904 commented 1 year ago

https://github.com/Xilinx/embeddedsw/issues/225#issue-1415371661

Can you help me with this, please??