Open christiaanb opened 4 years ago
The natural way of deduplicating work across programming languages is let-binding it. E.g. in:
let foo = bar ... in
case x of
3 -> foo
4 -> foo
5 -> ...
we'd expect foo
to be calculated only once. So I guess my question is, how likely is GHC to transform this to:
case x of
3 -> bar ...
4 -> bar ...
5 -> ...
? I.e., how reliable is deduplicating work by let-binding it? My initial thought is that opt-in is OK, as long as let-binding is a reliable deduplication measure.
Given that it sounds like an improvement some of the time but not others, I would probably lean towards opt-in (unless there's some way to make the heuristic good enough for practical purposes.) When flags are better documented in #951, we can state which flags are experimental / risky so users know that it may not make things better for all designs.
There is potential for a general flag to enable these kinds of maybe it helps, maybe not optimizations by default if people want it (like how GCC has -fexpensive-optimizations
for things which add a lot to compile times), we could have something like -fclash-questionable-optimizations
or -fclash-risky-optimizations
?
@martijnbastiaan Functions cannot be shared in Clash' current[1] translation-scheme (doing so would result in dual-driving ports), so simply let-binding a function-value will not result in deduplication. Notice the extra case-expressions Clash's current transformation adds for the arguments to f
in the given example.
[1] We can imagine Clash translating Haskell descriptions to sequential circuits instead of concurrent ones (See e.g. the work done on SAFL), in that case, functions could be shared.
@martijnbastiaan although your comment does call into question whether Common Subexpression Elimination (CSE) should be optin or optout; given that CSE increases fan-out. Then again, most HDL synthesis tools do CSE afaik, and then give you a max fan-out setting; so seems that industry has concluded that CSE should be opt-out.
The aspect tracked on this github issue is not really CSE, but more about Clash' specialized version of circuit deduplication.
@christiaanb Right, got it.
I think @alex-mckenna raises an important point:
unless there's some way to make the heuristic good enough for practical purposes.
Am I right in asserting we generally want de-duplication for "large" functions, but not for "small" ones? It sounds like making a heuristic for this shouldn't be very difficult.
@martijnbastiaan @alex-mckenna that's actually not trivial, since we don't know how expressions are tech-mapped (i.e. does the FPGA have specialised DSPs for multiplications). Also, it's not just about size, but also about latency: the extra multiplexers add extra latency, which is bad if the case-expressions is on the critical path.
Another thought: would the default matter that much if we could locally override it, like GHC can with optimizations? For example:
{-# OPTS_CLASH -fno-circuit-dedup #-}
The natural way of deduplicating work across programming languages is let-binding it. [...] My initial thought is that opt-in is OK, as long as let-binding is a reliable deduplication measure.
I agree. I did not expect CλaSH to transform the let-binding to duplicate the expression; I think that is rather unfortunate.
@DigitalBrains1 Clash definitely doesn't inline let-bindings; unless it's 100% guaranteed to add no extra "work", e.g. when it's a constant. The exception to this is when the let-binding has a "non-representable" (e.g. function) type.
Ah, I didn't look closely enough at the "let-binding a function-value" bit of your comment, thanks.
In that case, since let-binding would be a reliable deduplication measure, I'd be fine with opt-in.
Current situation
Normally Clash converts
to
i.e. it deduplicates functions (and certain operators) between case-alternatives to save on area. This comes at the cost of multiplexers on the arguments of the deduplicated function (or operator), which add to the propagation delay for the entire case-expression.
So sometimes we don't want Clash to do the deduplication, i.e. it adds to the critical path. In that case we can now specifically tell Clash not to deduplicate:
where the application of
f
in the A-alternative is now explicitly not deduplicated, and thef
in the B-alternative is the only remaining application off
it also will not be deduplicated.If the C-alternative also had an application of
f
, then the applications off
in the B- and C-alternatives would have been deduplicated; i.e. there would have been two applications off
in total.Discussion
All of the above calls into question whether the current automatic deduplication should be the default; explicitly turning it off using an annotation.
Perhaps it is preferable that deduplication is actually an opt-in feature: so that readers of the code can explicitly see what functions/operators are deduplicated, and which ones are not. Especially when the (current) heuristics for deduplication don't always result in the best Quality of Results (QOR). Of course this comes at the price that un-annotated code (what beginning users will probably write) in the opt-in scenario will overall result in worse QOR.