Open christiaanb opened 4 months ago
Two multiplexers of width X
results in just as much logic as one multiplexer of width X + X
. So whether we have one case-statement over a product type, or multiple case-statements over the individual types, results in the same size circuit.
You are correct that the subject of the case-expressions is duplicated by the new code. Deduplication does lead to a higher fan-out, and is something that can be controled by deDup
and noDeDup
. I guess it doesn't hurt the complexity of this transformation too much to already do the deduplication of subject here instead of waiting for the CSE transformation to do it.
simpleMux :: Unsigned 2 -> (t,t,t,t) -> t
simpleMux sel (a,b,c,d) = case sel of { 0 -> a; 1 -> b; 2 -> c; 3 -> d }
wideSubject :: Unsigned 256 -> (t,t,t,t) -> t
wideSubject sel (a,b,c,d) = case sel of { 0 -> a; 1 -> b; 0xaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabb -> c; _ -> d }
I agree that for something like simpleMux
having many narrow muxes or one wide one is basically equivalent.
But I thought the story might be different for something like wideSubject
.
I did an experiment.
-- make one big wide mux
f :: forall t sel. (sel -> (t,t,t,t) -> t) -> sel -> (t,t,t,t) -> t
f = id
-- make many 1 bit wide muxes
g :: forall ts sel. BitPack ts => (forall t. sel -> (t,t,t,t) -> t) -> sel -> (ts,ts,ts,ts) -> ts
g mux sel (a,b,c,d) = bitCoerce @(Vec (BitSize ts) Bit) @ts $ fmap (mux sel) $ zip4 as bs cs ds
where
as = bitCoerce a :: Vec (BitSize ts) Bit
bs = bitCoerce b :: Vec (BitSize ts) Bit
cs = bitCoerce c :: Vec (BitSize ts) Bit
ds = bitCoerce d :: Vec (BitSize ts) Bit
fWide,gWide :: Unsigned 256 -> (Unsigned 40, Unsigned 40, Unsigned 40, Unsigned 40) -> Unsigned 40
fWide = f wideSubject
gWide = g wideSubject
Vivado does synthesize to them to the different muxes:
fWide:
+---Muxes :
4 Input 40 Bit Muxes := 1
4 Input 2 Bit Muxes := 1
gWide:
+---Muxes :
4 Input 2 Bit Muxes := 40
4 Input 1 Bit Muxes := 40
But in the end both had the exact same cell usage:
+------+-----+------+
| |Cell |Count |
+------+-----+------+
|1 |LUT4 | 68|
|2 |LUT5 | 36|
|3 |LUT6 | 88|
|4 |IBUF | 416|
|5 |OBUF | 40|
+------+-----+------+
I've changed the code so that it generates a single tuple again, in order to preserve sharing of the subject of the case-statement
@leonschoorl @DigitalBrains1 Could you have a look at the new version? I believe it addresses your concerns.
Create multiple selectors, one for each non-shared argument.We still use one selector in order to preserve sharing.The previous code was a big mess where we partioned arguments into shared and non-shared and then filtered the case-tree depending on whether they were part of the shared arguments or not. But then with the normalisation of type arguments, the second filter did not work properly. This then resulted in shared arguments becoming part of the tuple in the alternatives of the case-expression for the non-shared arguments.
The new code is also more robust in the sense that shared and non-shared arguments no longer need to be partioned (shared occur left-most, non-shared occur right-most). They can now be interleaved. The old code would also generate bad Core if ever type and term arguments occured interleaved, this is no longer the case for the new code.
Fixes #2628
Still TODO: