clash-lang / clash-compiler

Haskell to VHDL/Verilog/SystemVerilog compiler
https://clash-lang.org/
Other
1.42k stars 150 forks source link

Circuit deduplication, opt-in or opt-out? #954

Open christiaanb opened 4 years ago

christiaanb commented 4 years ago

Current situation

Normally Clash converts

case x of
  A -> f 3 y
  B -> f x x
  C -> h x

to

let f_arg0 = case x of {A -> 3; _ -> x}
    f_arg1 = case x of {A -> y; _ -> x}
    f_out  = f f_arg0 f_arg1
in  case x of
      A -> f_out
      B -> f_out
      C -> h x

i.e. it deduplicates functions (and certain operators) between case-alternatives to save on area. This comes at the cost of multiplexers on the arguments of the deduplicated function (or operator), which add to the propagation delay for the entire case-expression.

So sometimes we don't want Clash to do the deduplication, i.e. it adds to the critical path. In that case we can now specifically tell Clash not to deduplicate:

case x of
  A -> noDeDup f 3 y
  B -> f x y
  C -> h x

where the application of f in the A-alternative is now explicitly not deduplicated, and the f in the B-alternative is the only remaining application of f it also will not be deduplicated.

If the C-alternative also had an application of f, then the applications of f in the B- and C-alternatives would have been deduplicated; i.e. there would have been two applications of f in total.

Discussion

All of the above calls into question whether the current automatic deduplication should be the default; explicitly turning it off using an annotation.

Perhaps it is preferable that deduplication is actually an opt-in feature: so that readers of the code can explicitly see what functions/operators are deduplicated, and which ones are not. Especially when the (current) heuristics for deduplication don't always result in the best Quality of Results (QOR). Of course this comes at the price that un-annotated code (what beginning users will probably write) in the opt-in scenario will overall result in worse QOR.

martijnbastiaan commented 4 years ago

The natural way of deduplicating work across programming languages is let-binding it. E.g. in:

let foo = bar ... in
case x of
  3 -> foo
  4 -> foo
  5 -> ...

we'd expect foo to be calculated only once. So I guess my question is, how likely is GHC to transform this to:

case x of
  3 -> bar ...
  4 -> bar ...
  5 -> ...

? I.e., how reliable is deduplicating work by let-binding it? My initial thought is that opt-in is OK, as long as let-binding is a reliable deduplication measure.

alex-mckenna commented 4 years ago

Given that it sounds like an improvement some of the time but not others, I would probably lean towards opt-in (unless there's some way to make the heuristic good enough for practical purposes.) When flags are better documented in #951, we can state which flags are experimental / risky so users know that it may not make things better for all designs.

There is potential for a general flag to enable these kinds of maybe it helps, maybe not optimizations by default if people want it (like how GCC has -fexpensive-optimizations for things which add a lot to compile times), we could have something like -fclash-questionable-optimizations or -fclash-risky-optimizations?

christiaanb commented 4 years ago

@martijnbastiaan Functions cannot be shared in Clash' current[1] translation-scheme (doing so would result in dual-driving ports), so simply let-binding a function-value will not result in deduplication. Notice the extra case-expressions Clash's current transformation adds for the arguments to f in the given example.

[1] We can imagine Clash translating Haskell descriptions to sequential circuits instead of concurrent ones (See e.g. the work done on SAFL), in that case, functions could be shared.

christiaanb commented 4 years ago

@martijnbastiaan although your comment does call into question whether Common Subexpression Elimination (CSE) should be optin or optout; given that CSE increases fan-out. Then again, most HDL synthesis tools do CSE afaik, and then give you a max fan-out setting; so seems that industry has concluded that CSE should be opt-out.

The aspect tracked on this github issue is not really CSE, but more about Clash' specialized version of circuit deduplication.

martijnbastiaan commented 4 years ago

@christiaanb Right, got it.

I think @alex-mckenna raises an important point:

unless there's some way to make the heuristic good enough for practical purposes.

Am I right in asserting we generally want de-duplication for "large" functions, but not for "small" ones? It sounds like making a heuristic for this shouldn't be very difficult.

christiaanb commented 4 years ago

@martijnbastiaan @alex-mckenna that's actually not trivial, since we don't know how expressions are tech-mapped (i.e. does the FPGA have specialised DSPs for multiplications). Also, it's not just about size, but also about latency: the extra multiplexers add extra latency, which is bad if the case-expressions is on the critical path.

martijnbastiaan commented 4 years ago

Another thought: would the default matter that much if we could locally override it, like GHC can with optimizations? For example:

{-# OPTS_CLASH -fno-circuit-dedup #-}
DigitalBrains1 commented 4 years ago

The natural way of deduplicating work across programming languages is let-binding it. [...] My initial thought is that opt-in is OK, as long as let-binding is a reliable deduplication measure.

I agree. I did not expect CλaSH to transform the let-binding to duplicate the expression; I think that is rather unfortunate.

christiaanb commented 4 years ago

@DigitalBrains1 Clash definitely doesn't inline let-bindings; unless it's 100% guaranteed to add no extra "work", e.g. when it's a constant. The exception to this is when the let-binding has a "non-representable" (e.g. function) type.

DigitalBrains1 commented 4 years ago

Ah, I didn't look closely enough at the "let-binding a function-value" bit of your comment, thanks.

In that case, since let-binding would be a reliable deduplication measure, I'd be fine with opt-in.