calyxir / calyx

Intermediate Language (IL) for Hardware Accelerator Generators
https://calyxir.org
MIT License
450 stars 44 forks source link

`cell-share` Needs a Timing-Based Heuristic or Cost Model #2021

Open andrewb1999 opened 3 weeks ago

andrewb1999 commented 3 weeks ago

From my current understanding, the cell-share pass does not make sharing decisions with any timing information. This results in the pass often sharing cells in a way that can reduce the max frequency of a design. For example, because of how AMC implements 2D memories right now, accesses to these memories can often be the critical timing path for our designs. Along this path there are several adders and shifters which compile to about 5-6 levels of LUT logic. If some of these adders are shared in cell-share then additional muxes are added which increases the number of levels of logic on the critical timing path.

Ideally, cell-share should have a model of timing, take in an argument for the target frequency, and use those in conjunction to make better decisions about sharing. Obviously the hard part of this is the timing model, so I'm wondering if there's some stop gap for now where we just have some heuristic based on how many operations are chained together and assign some values to come up with a chain length. We can then set a cutoff to prevent sharing for chains above a certain length. This length with also need to take into account the size of the operators in the path, as a 4 bit adder will have less propagation delay than a 32 bit adder.

If anyone is interested in taking this on, I can provide some examples where this is an issue. For now, turning off sharing doesn't hurt our resource utilization that much (as most of our designs are highly pipelined), so this isn't really on the critical path for AMC but I thought it would still be good to write this issue down somewhere.

calebmkim commented 3 weeks ago

Right now, there is a very basic hueristic you can use using the syntax: -x cell-share:bounds=x,y,z. x denotes the number of times you can share a given combinational cell (e.g., adders, subtracters, etc.), y for registers, and z for all other cells. So if you do -x cell-share:bounds=1,4,-1, it means you can't share any combinational cells, each register can be shared at most four times, and all other cells are shared as much as possible.

But, as you point out, this heuristic doesn't look at the data path/chains of the cells that are being shared, which is what is important for critical path.

I think I could take this on, but maybe in a couple weeks-- there's a bunch of other optimization related stuff I'm also working on rn. (And it seems like this isn't super urgent for you currently).