clash-lang / clash-compiler

Haskell to VHDL/Verilog/SystemVerilog compiler
https://clash-lang.org/
Other
1.4k stars 147 forks source link

Clash unnecessarily unrolls `replicate` when used with `blockRam` #621

Open rowanG077 opened 5 years ago

rowanG077 commented 5 years ago

Hi!

I'm using clash 0.99.3.

I'm working on implementing a fifo buffer using the blockrams that can have a read port and a write port that have have different widths. The efficiency of the usage of blockram elements on my FPGA are dependent on the size of the port that is chosen. To optimize this I calculate at the type level what an oke port width would be so I can still efficiently use the memory. See below for the types I create.

type FactorGT a b = FactorGTRaw a b (CmpNat (Mod b a) 0)

type family FactorGTRaw a b v where
    FactorGTRaw a b 'EQ = a
    FactorGTRaw a b _ = FactorGT (a+1) b

-- Calculate efficient blockram usage
type FPGABlockRams = 2700
type FPGABlockRamBasePortWidth = 5
type FPGABlockRamBaseMultiplier = 2048
type FPGABlockRamSingleElement = FPGABlockRamBasePortWidth * FPGABlockRamBaseMultiplier
type FPGABlockRamBits = FPGABlockRams * FPGABlockRamSingleElement

type FifoBitsAddressSize = CLog 2 FPGABlockRamBits

type FifoBlockRamPortWidth writePortWidth readPortWidth
   = FactorGT
       ( FPGABlockRamBasePortWidth 
       * DivRU (Max writePortWidth readPortWidth) 
               FPGABlockRamBasePortWidth
       )
       FPGABlockRamSingleElement

I then go on to use these types in my fifo function. In clashi this works fine and I can fully test my FiFo. However when compiling it will take forever. Something I tested is that if I change

type FifoBlockRamPortWidth writePortWidth readPortWidth
   = FactorGT
       ( FPGABlockRamBasePortWidth 
       * DivRU (Max writePortWidth readPortWidth) 
               FPGABlockRamBasePortWidth
       )
       FPGABlockRamSingleElement

to

type FifoBlockRamPortWidth writePortWidth readPortWidth = 160

Which is the value that is calculated during compilation for a writePortWidth of 128 and a readPortWidth of 24 which I use in my topEntity. Then the compilation errors out with the message Cannot reduce an integer.

Below is a complete example with topEntity that shows this issue.

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE StandaloneDeriving #-}

module MinimalExample where

import Clash.Prelude

import Data.Maybe
import GHC.Generics (Generic)
import Control.DeepSeq

type FactorGT a b = FactorGTRaw a b (CmpNat (Mod b a) 0)

type family FactorGTRaw a b v where
    FactorGTRaw a b 'EQ = a
    FactorGTRaw a b _ = FactorGT (a+1) b

-- Calculate efficient blockram usage
type FPGABlockRams = 2700
type FPGABlockRamBasePortWidth = 5
type FPGABlockRamBaseMultiplier = 2048
type FPGABlockRamSingleElement = FPGABlockRamBasePortWidth * FPGABlockRamBaseMultiplier
type FPGABlockRamBits = FPGABlockRams * FPGABlockRamSingleElement

type FifoBitsAddressSize = CLog 2 FPGABlockRamBits

type FifoBlockRamPortWidth writePortWidth readPortWidth
   = 160

type FifoBlockRamPortWidthAddressSize writePortWidth readPortWidth
  = 1 + CLog 2 (FifoBlockRamPortWidth writePortWidth readPortWidth)

type FifoBlockRamWords writePortWidth readPortWidth
   = Div 
       FPGABlockRamBits
       (FifoBlockRamPortWidth writePortWidth readPortWidth)

type FifoWordsAddressSize writePortWidth readPortWidth
   = 1 + CLog 2 (FifoBlockRamWords writePortWidth readPortWidth)

data FifoState writePortWidth readPortWidth 
   = FifoStateDetermineFifoDepth

data FifoInput writePortWidth readPortWidth
   = FifoInput
     { fifoPop           :: Bool
     , fifoWriteRequest  :: Maybe (BitVector writePortWidth)
     , requiredFifoSize  :: Unsigned FifoBitsAddressSize
     } deriving (Show, Generic, NFData)

data FifoOutput writePortWidth readPortWidth
   = FifoOutput
     { fifoData          :: Maybe (BitVector readPortWidth)
     , fifoReadCapacity  :: Unsigned FifoBitsAddressSize
     , fifoWriteCapacity :: Unsigned FifoBitsAddressSize
     } deriving (Show, Generic, NFData)

data MealyFifoInput writePortWidth readPortWidth
  = MealyFifoInput
    { fifoInput          :: FifoInput writePortWidth readPortWidth
    , memData            :: BitVector (FifoBlockRamPortWidth writePortWidth readPortWidth)
    } deriving (Generic, NFData)

deriving instance
  ( KnownNat writePortWidth
  , KnownNat (FifoBlockRamPortWidth writePortWidth readPortWidth)
  ) => Show (MealyFifoInput writePortWidth readPortWidth)

data MealyFifoOutput writePortWidth readPortWidth
  = MealyFifoOutput
    { fifoOutput         :: FifoOutput writePortWidth readPortWidth
    , fifoMemReadAddr    :: Unsigned (FifoWordsAddressSize writePortWidth readPortWidth)
    , fifoMemWrite       :: Maybe
                            ( Unsigned (FifoWordsAddressSize writePortWidth readPortWidth)
                            , BitVector (FifoBlockRamPortWidth writePortWidth readPortWidth)
                            )
    } deriving (Generic, NFData)

deriving instance
  ( KnownNat readPortWidth
  , KnownNat (FifoBlockRamPortWidth writePortWidth readPortWidth)
  ) => Show (MealyFifoOutput writePortWidth readPortWidth)

fifo
  :: forall (writePortWidth   :: Nat)
            (readPortWidth    :: Nat)
            (portWidth        :: Nat)
            (wordsAddressSize :: Nat)
            (fifoSizeWords    :: Nat)
            (domain           :: Domain)
            (gated            :: ClockKind)
            (synchronous      :: ResetKind)
  . ( HiddenClockReset domain gated synchronous
    , KnownNat writePortWidth
    , KnownNat readPortWidth
    , KnownNat portWidth
    , KnownNat wordsAddressSize
    , KnownNat fifoSizeWords
    , portWidth ~ FifoBlockRamPortWidth writePortWidth readPortWidth
    , wordsAddressSize ~ FifoWordsAddressSize writePortWidth readPortWidth
    , fifoSizeWords ~ FifoBlockRamWords writePortWidth readPortWidth
    , 1 <= portWidth
    , 1 <= fifoSizeWords
    )
  => Signal domain (FifoInput writePortWidth readPortWidth)
  -> Signal domain (FifoOutput writePortWidth readPortWidth)
fifo input = fifoOutput <$> mealyOutput
  where
    mem :: HiddenClock domain gated
        => Signal domain (Unsigned wordsAddressSize)
        -> Signal domain (Maybe (Unsigned wordsAddressSize, BitVector portWidth))
        -> Signal domain (BitVector portWidth)
    mem = blockRam $ replicate (SNat :: SNat fifoSizeWords) 0

    initialState :: FifoState writePortWidth readPortWidth
    initialState = FifoStateDetermineFifoDepth

    mealyInput = MealyFifoInput <$> input <*> memData
    mealyOutput = mealy fifoTransition initialState mealyInput

    memReadAddr = fifoMemReadAddr <$> mealyOutput
    memWrite = fifoMemWrite <$> mealyOutput
    memData = mem memReadAddr memWrite

fifoTransition FifoStateDetermineFifoDepth input = (FifoStateDetermineFifoDepth, undefined)

{-# NOINLINE topEntity #-}
{-# ANN topEntity
  (Synthesize
  { t_name   = "MinimalExample"
  , t_inputs =  [ PortName "clk"
                , PortName "rst"
                , PortProduct ""
                  [ PortName "pop"
                  , PortName "writeRequest"
                  , PortName "writeData" ] ]
  , t_output = PortProduct ""
                  [ PortName "dataAvailable"
                  , PortName "data"
                  , PortName "readCapacity"
                  , PortName "writeCapacity" ]
  }) #-}
topEntity
  :: Clock System 'Source
  -> Reset System 'Asynchronous
  -> Signal System Bool
  -> Signal System Bool
  -> Signal System (BitVector 128)
  -> Signal System (Bool, BitVector 24, Unsigned 32, Unsigned 32)
topEntity clk rst pop writeRequest writeData = bundle (dataValid, readData, readCapacity, writeCapacity)
  where
    toMaybe cond x = if cond then Just x else Nothing
    writeDataM = toMaybe <$> writeRequest <*> writeData

    fifoInput :: Signal System (FifoInput 128 24)
    fifoInput = FifoInput <$> pop <*> writeDataM <*> pure 100    
    fifoOutput = exposeClockReset fifo clk rst fifoInput

    dataValid = isJust <$> fifoData <$> fifoOutput
    readData = maybe 0 id <$> fifoData <$> fifoOutput 
    readCapacity = resize <$> fifoReadCapacity <$> fifoOutput
    writeCapacity = resize <$> fifoWriteCapacity <$> fifoOutput
leonschoorl commented 5 years ago

When run with -fclash-debug DebugName

deadcode
deadcode
deadcode
deadcode
deadcode
deadcode
applicationPropagation
caseCon
elemExistentials
inlineWorkFree
applicationPropagation
caseCon
bindConstantVar
caseCon
inlineSmall
applicationPropagation
inlineSmall
applicationPropagation
inlineSmall
applicationPropagation
bindConstantVar
bindOrLiftNonRep
applicationPropagation
removeUnusedExpr
removeUnusedExpr

reduceNonRepPrim

inlineWorkFree
bindConstantVar
...

And then seems to repeat inlineWorkFree and bindConstantVar forever.


When run with -fclash-debug DebugApplied Clash hangs trying to print the input term to reduceNonRepPrim

rowanG077 commented 5 years ago

@leonschoorl I ran with -fclash-debug DebugName as per @martijnbastiaan suggestion on google groups and for me it's stuck on a bindOrLiftNonRep step.

christiaanb commented 5 years ago

@leonschoorl: On master the issue is that Clash unrolls replicate (SNat @ fifoSizeWords) 0 as the argument to blockRam, where fifoSizeWords is quite large: 172800.

If I hardcode Clash not to do that, compilation is successful on master. The reason Clash wants to unroll it is because the first argument to blockRam is its initial state, which must be evaluated to a literal because of restrictions in the grammar of the HDL.

However, in this case, the HDL generated by the blackbox for replicate actually adheres to the grammar. We need to find a way to encode this, so Clash won’t unroll the replicate.

p.s. I discovered this by compiling with DebugAll, and saw that compilation halted for a bit trying to apply reduceNonRepPrim to the replicate in question.

rowanG077 commented 5 years ago

@christiaanb All right thank you!

Do you have any idea when this could be fixed? Not to pressure you guys but I need to decide whether I will implement alternatives or wait for this issue to be fixed.

martijnbastiaan commented 5 years ago

No comment on when it'll be fixed, but I can write a small workaround for you. Are you using VHDL, Verilog, or SystemVerilog?

rowanG077 commented 5 years ago

Thank you! I'm using VHDL.

martijnbastiaan commented 5 years ago

I can't really test the hack, as I've botched my Clash 0.99.3 installation, but I believe this should work. I've probably made a number of small typos / forgot imports. In your current working directory, put BlockRam1.json and BlockHack.hs. You should be able to use blockRam1 from there without any troubles. Let me know!

rowanG077 commented 5 years ago

@martijnbastiaan Thank you very much! There where two problems but I could fix them myself. I changed:

blockRam1 = \n a rd wrM -> withFrozenCallStack
  (hideClock E.blockRam1E n a rd wrM)

to

blockRam1 = \n a rd wrM -> withFrozenCallStack
  (hideClock blockRam1E n a rd wrM)

and blockRam1# was missing an argument

in  withFrozenCallStack
      (blockRam1# clk (replicate n a) (fromEnum <$> rd) en (fromEnum <$> wr) din)

changed to

  in  withFrozenCallStack
      (blockRam1# clk (replicate n a) a (fromEnum <$> rd) en (fromEnum <$> wr) din)

I haven't yet tested further then that it loads in clashi and can do this at the earliest next week but very nice of you guys to help so quickly :+1: !

rowanG077 commented 5 years ago

Since my original design didn't compile in clash 0.99.3 I have updated the version we use internally to the current master branch with some help from @leonschoorl. I had to modify the workaround primitive created by @martijnbastiaan sligthly to make it compile. I added the kind key with value Declaration and changed templateD key to template. See the gists below for a complete set which doesn't compile it's very similar to the one I posted at the start of this thread. I just replaced blockram with the workaround and added deriving Undefined to the data types. Loading in clashi succeeds.

MinimalExample.hs blockramhack.hs blockram1.json compiler output with -fclash-debug DebugApplied

martijnbastiaan commented 5 years ago

@rowanG077 I've added blockRam1 in https://github.com/clash-lang/clash-compiler/pull/645. Hopefully it will be merged soon.

rowanG077 commented 5 years ago

@martijnbastiaan

The compiler still hangs even using your blockram workaround.

rowanG077 commented 5 years ago

Actually I think there is another issue besides the blockram workaround not working. If I set the blockram to a very small size so it should compile it still doesn't work.

However If I replace

type FifoBlockRamPortWidth writePortWidth readPortWidth
   = FactorGT
       ( FPGABlockRamBasePortWidth 
       * DivRU (Max writePortWidth readPortWidth) 
               FPGABlockRamBasePortWidth
       )
       FPGABlockRamSingleElement

with

type FifoBlockRamPortWidth writePortWidth readPortWidth
   = 160

and also remove the undefined from fifoTransition

fifoTransition FifoStateDetermineFifoDepth input = (FifoStateDetermineFifoDepth, output)
  where
    output = MealyFifoOutput (FifoOutput Nothing 0 0) 0 Nothing

The code compiles. When I then change FifoBlockRamPortWidth

type FifoBlockRamPortWidth writePortWidth readPortWidth
   = 160

back to

type FifoBlockRamPortWidth writePortWidth readPortWidth
   = FactorGT
    ( FPGABlockRamBasePortWidth 
    * DivRU (Max writePortWidth readPortWidth) 
            FPGABlockRamBasePortWidth
    )
    FPGABlockRamSingleElement

Then it won't finish compiling. It gets stuck the same way as the previous gist I posted.

christiaanb commented 5 years ago

@martijnbastiaan @leonschoorl Although not an issue in this use-case (or at not least, not yet), https://github.com/clash-lang/clash-compiler/blob/39d9dbf3fd71ed5c644a8a71f50dfbedfbd33b1f/clash-lib/src/Clash/Core/Type.hs#L436-L437 is missing an implementation for http://hackage.haskell.org/package/base-4.12.0.0/docs/GHC-TypeNats.html#t:CmpNat . This could become an issue, because if Clash cannot reduce some stuff to a type-level literal, it is deemed unrepresentable.

leonschoorl commented 5 years ago

The BlockHack @martijnbastiaan provided here didn't work correctly, it still had the same fundamental problem of wanting to to create a huge Vec at compile time.

But there is now a proper implementation of blockRam1 was added to master in https://github.com/clash-lang/clash-compiler/commit/39d9dbf3fd71ed5c644a8a71f50dfbedfbd33b1f Only master has had a big API change in the meantime, and we need to change you example a little:

--- MinimalExample.hs.orig  2019-06-26 15:11:09.129281531 +0200
+++ MinimalExample.hs   2019-06-26 15:21:00.289273674 +0200
@@ -10,7 +10,6 @@
 module MinimalExample where

 import Clash.Prelude
-import BlockRamHack

 import Data.Maybe
 import GHC.Generics (Generic)
@@ -101,9 +100,9 @@
             (wordsAddressSize :: Nat)
             (fifoSizeWords    :: Nat)
             (domain           :: Domain)
-            (gated            :: ClockKind)
+            (conf             :: DomainConfiguration)
             (synchronous      :: ResetKind)
-  . ( HiddenClockReset domain gated synchronous
+  . ( HiddenClockResetEnable domain conf
     , KnownNat writePortWidth
     , KnownNat readPortWidth
     , KnownNat portWidth
@@ -119,11 +118,11 @@
   -> Signal domain (FifoOutput writePortWidth readPortWidth)
 fifo input = fifoOutput <$> mealyOutput
   where
-    mem :: HiddenClock domain gated
+    mem :: HiddenClock domain conf
         => Signal domain (Unsigned wordsAddressSize)
         -> Signal domain (Maybe (Unsigned wordsAddressSize, BitVector portWidth))
         -> Signal domain (BitVector portWidth)
-    mem = blockRam1 (SNat :: SNat fifoSizeWords) 0
+    mem = blockRam1 ClearOnReset (SNat :: SNat fifoSizeWords) 0

     initialState :: FifoState writePortWidth readPortWidth
     initialState = FifoStateDetermineFifoDepth
@@ -145,10 +144,9 @@
   { t_name   = "MinimalExample"
   , t_inputs =  [ PortName "clk"
                 , PortName "rst"
-                , PortProduct ""
-                  [ PortName "pop"
-                  , PortName "writeRequest"
-                  , PortName "writeData" ] ]
+                , PortName "pop"
+                , PortName "writeRequest"
+                , PortName "writeData" ]
   , t_output = PortProduct ""
                   [ PortName "dataAvailable"
                   , PortName "data"
@@ -156,8 +154,8 @@
                   , PortName "writeCapacity" ]
   }) #-}
 topEntity
-  :: Clock System 'Source
-  -> Reset System 'Asynchronous
+  :: Clock System
+  -> Reset System
   -> Signal System Bool
   -> Signal System Bool
   -> Signal System (BitVector 128)
@@ -169,7 +167,7 @@

     fifoInput :: Signal System (FifoInput 128 24)
     fifoInput = FifoInput <$> pop <*> writeDataM <*> pure 100    
-    fifoOutput = exposeClockReset fifo clk rst fifoInput
+    fifoOutput = exposeClockResetEnable fifo clk rst enableGen fifoInput

     dataValid = isJust <$> fifoData <$> fifoOutput
     readData = maybe 0 id <$> fifoData <$> fifoOutput 

(Be sure to remove the blockram1.json, otherwise the new clash will fail to run)

Then we run into what Christiaan mentioned, Clash doesn't handle CmpNat.

I've added this in the fix-type-families branch. There are still some issues with the type family resolution I'm working on, but your example (with the patch above) works.