clash-lang / clash-compiler

Haskell to VHDL/Verilog/SystemVerilog compiler
https://clash-lang.org/
Other
1.4k stars 147 forks source link

Refactor DEC transformation #2668

Open christiaanb opened 4 months ago

christiaanb commented 4 months ago

Create multiple selectors, one for each non-shared argument. We still use one selector in order to preserve sharing.

The previous code was a big mess where we partioned arguments into shared and non-shared and then filtered the case-tree depending on whether they were part of the shared arguments or not. But then with the normalisation of type arguments, the second filter did not work properly. This then resulted in shared arguments becoming part of the tuple in the alternatives of the case-expression for the non-shared arguments.

The new code is also more robust in the sense that shared and non-shared arguments no longer need to be partioned (shared occur left-most, non-shared occur right-most). They can now be interleaved. The old code would also generate bad Core if ever type and term arguments occured interleaved, this is no longer the case for the new code.

Fixes #2628

Still TODO:

christiaanb commented 4 months ago

Two multiplexers of width X results in just as much logic as one multiplexer of width X + X. So whether we have one case-statement over a product type, or multiple case-statements over the individual types, results in the same size circuit.

You are correct that the subject of the case-expressions is duplicated by the new code. Deduplication does lead to a higher fan-out, and is something that can be controled by deDup and noDeDup. I guess it doesn't hurt the complexity of this transformation too much to already do the deduplication of subject here instead of waiting for the CSE transformation to do it.

leonschoorl commented 4 months ago
simpleMux :: Unsigned 2 -> (t,t,t,t) -> t
simpleMux sel (a,b,c,d) = case sel of { 0 -> a; 1 -> b; 2 -> c; 3 -> d }

wideSubject :: Unsigned 256 -> (t,t,t,t) -> t
wideSubject sel (a,b,c,d) = case sel of { 0 -> a; 1 -> b; 0xaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabb -> c; _ -> d }

I agree that for something like simpleMux having many narrow muxes or one wide one is basically equivalent.

But I thought the story might be different for something like wideSubject.

leonschoorl commented 4 months ago

I did an experiment.

-- make one big wide mux
f :: forall t sel. (sel -> (t,t,t,t) -> t) -> sel -> (t,t,t,t) -> t
f = id

-- make many 1 bit wide muxes
g :: forall ts sel. BitPack ts => (forall t. sel -> (t,t,t,t) -> t) -> sel -> (ts,ts,ts,ts) -> ts
g mux sel (a,b,c,d) = bitCoerce @(Vec (BitSize ts) Bit) @ts $ fmap (mux sel) $ zip4 as bs cs ds
 where
  as = bitCoerce a :: Vec (BitSize ts) Bit
  bs = bitCoerce b :: Vec (BitSize ts) Bit
  cs = bitCoerce c :: Vec (BitSize ts) Bit
  ds = bitCoerce d :: Vec (BitSize ts) Bit

fWide,gWide :: Unsigned 256 -> (Unsigned 40, Unsigned 40, Unsigned 40, Unsigned 40) -> Unsigned 40
fWide =  f wideSubject
gWide =  g wideSubject

Vivado does synthesize to them to the different muxes:

fWide:
+---Muxes : 
       4 Input   40 Bit        Muxes := 1     
       4 Input    2 Bit        Muxes := 1     

gWide:
+---Muxes : 
       4 Input    2 Bit        Muxes := 40    
       4 Input    1 Bit        Muxes := 40   

But in the end both had the exact same cell usage:

+------+-----+------+
|      |Cell |Count |
+------+-----+------+
|1     |LUT4 |    68|
|2     |LUT5 |    36|
|3     |LUT6 |    88|
|4     |IBUF |   416|
|5     |OBUF |    40|
+------+-----+------+
christiaanb commented 4 months ago

I've changed the code so that it generates a single tuple again, in order to preserve sharing of the subject of the case-statement

christiaanb commented 2 months ago

@leonschoorl @DigitalBrains1 Could you have a look at the new version? I believe it addresses your concerns.