parthsarkar17 commented 2 months ago

First step / end goals of performance dashboard?

What benchmarks would we use? (needed to test metrics we produce on the dashboard):

systolic arrays
- examples: 1d/2d convolve
pifo trees
polybench benchmarks
amc benchmarks
TVM ml benchmarks
ntt
filament designs, compiled to calyx
calyx numbers
new benchmarks?
EDIT: most important are systolic arrays, polybench benchmarks, NTT, TVM ML benchmarks

How do we generate Calyx designs (as an end goal)?

i.e. how do we expect users to use the dashboard? directly from their high-level designs that target calyx, or explicitly generate calyx first?
EDIT: after speaking with @rachitnigam , it was decided that this tool would be used internally in order to get performance metrics on new Calyx versions. That means that the dashboard would accept Calyx designs directly, and not worry about making sure any external user could visualize their design's performance implemented in a higher-level language.

What are some metrics people care about?

Critical path + frequency
Resource usage: FPGA has limited resources; how many does your design need to use?
- luts
- dsps
- registers
- memory
FSM size?
Calyx compile time
Anything else important?

Adrian mentioned targeting ASICs in addition to FPGAs

Future use of the dashboard?

get feedback on how an optimization performs <-- goal for now
get feedback on how calyx performs w.r.t other compilers (with external baseline)
feedback-directed? <-- for this, wouldn't we need to be prepared for higher-level designs (and not simply individual Calyx files)?

sampsyo commented 2 months ago

Awesome; thanks for planning out some of this stuff! Here are just a couple of assorted thoughts:

i.e. how do we expect users to use the dashboard? directly from their high-level designs that target calyx, or explicitly generate calyx first?

I think a good "v1.0" goal would be to start with pre-generated Calyx code. This would, of course, simplify the logistics a lot: all the automation could start at the Calyx level and not deal with the N different potential frontend toolchains that could generate Calyx code. (In practice, getting N different frontends going, with all their various dependencies and stuff, has proven to be a huge headache for automation CI. Skipping all that seems great.)

At some later version, maybe we can revisit this and go all the way from frontend source code. This would help us compare different toolchains more meaningfully (e.g., Vitis, XLS?). But I think that could very much count as a v2.0 feature!

get feedback on how calyx performs w.r.t other compilers (with external baseline)

Indeed, this seems like cool work for the future. We can start with it all being just Calyx-vs.-Calyx, but after that, it makes sense to broaden things out and support other ADL-ish compilers. Having a robust benchmark suite would be a huge contribution to the community.

Adrian mentioned targeting ASICs in addition to FPGAs

I guess the main reasons I suggested this are:

Automatically running the Xilinx toolchain seems like a huge pain, because of license keys/etc. Running the open-source ASIC EDA flows may be easier?
For external comparisons, it might yield more stable/predictable results.

But of course, we have spent 0 time optimizing stuff for ASICs, and we collectively have no expertise in running these toolchains. So that would be a good reason not to do this in the near future.

parthsarkar17 commented 2 months ago

I think a good "v1.0" goal would be to start with pre-generated Calyx code. This would, of course, simplify the logistics a lot: all the automation could start at the Calyx level and not deal with the N different potential frontend toolchains that could generate Calyx code.

Definitely, that makes sense like a place to start, especially because our impression is that it could mainly be used to get concrete measurements on how much some optimization to the complier improves real-world performance.

In addition to the toolchain comparisons that you mentioned, (and this is me throwing stuff out for the sake of discussion), if we go down the path of feedback-directed optimization, accepting different frontends would help us provide specific optimizations to various users, which would be awesome. Again, super far out, but maybe worth starting to think about now? I don't know exactly what compilation units that frontends produce (and I'm assuming each one produces some different set), but maybe starting to think about a way that we could generalize to any number/type of frontend generations of Calyx files? That said, it's definitely important to set sights on V1 for now :)

calyxir / calyx

Performance Dashboard Planning #1960

First step / end goals of performance dashboard?

What benchmarks would we use? (needed to test metrics we produce on the dashboard):

How do we generate Calyx designs (as an end goal)?

What are some metrics people care about?

Adrian mentioned targeting ASICs in addition to FPGAs

Future use of the dashboard?