clash-lang / clash-compiler

Haskell to VHDL/Verilog/SystemVerilog compiler
https://clash-lang.org/
Other
1.45k stars 154 forks source link

Complete the zoo of registers #802

Open martijnbastiaan opened 5 years ago

martijnbastiaan commented 5 years ago
Clock Reset Enable Initial Name
X dflipflop
X X dflipflopI?
X X dflipflopE?
X X dflipflopR?
X X X delay -> delayE?
X X X delayR?
X X X
X X X X register

Once we figure out a satisfactory way to do clockless registers (similar to asynchronous resets maybe?):

Clock Reset Enable Initial Name
dlatch
X srlatch
...

Alternatively, we would create a separate module where ALL the kinds of registers are exported with systematic naming. Something like (for the record: I don't know whether all these kinds of latches are actually possible):

Clock Reset Enable Initial Name
X dff
X X dffI
X X dffE
X X dffR
X X X dffRE
X X X dffRI
X X X dffRE
X X X X dffREI
latch
X latchI
X latchE
X latchR
X X latchRE
X X latchRI
X X latchRE
X X X latchREI
blaxill commented 5 years ago

FWIW the I suffix is currently used throughout the code base to imply an implicit typelevel Nat.

s5bug commented 3 years ago

Has any progress been made here? I've been looking for a high-level HDL suitable for writing clockless circuits (heavy use of D Latches and C Elements) and I'd really love to be able to do it with Clash.

I've basically lost my mind trying to search for HDLs that are good for clockless circuits because I can only find references to a "Haste" and a "TiDE" that seem to not exist.

christiaanb commented 3 years ago

@s5bug I would love that as well, and while I've spend some time on investigating clockless (like 80 hours if I had to guess) circuits, I haven't really spend enough to understand:

  1. How to model the "essence" of clockless design
  2. And then create an API that enables you to performance-optimize your clockless design, by which I mean you making adjustments to the circuit description so that some performance metric improves. i.e. the clockless counterpart of pipelining and retiming of the clocked world.

What I mean by that, currently in Clash we model clocked sequential logic as functions operating on infinite streams, where the elements of these streams represent the "stable" (in the not violating setup+hold timing sense) values, spaced equally in time in correspondence to the clock.

So in a clockless world, we have multiple options:

  1. Drop both "spaced equally in time" and the notion of "stable values" (while simultaneously not being able to model/observe the analog nature of bi-stables either, i.e. meta-stability), and model clockless sequential logic as functions operating on "functions-from-time-to-discrete-value", e.g.
    
    module ClocklessPure where

import Control.Applicative

type Time = Float newtype Signal a = S { runS :: Time -> (a, Signal a) }

instance Functor Signal where fmap f s = S $ \t -> let (a,sC) = runS s t in (f a, fmap f sC)

instance Applicative Signal where pure a = let q = S (const (a,q)) in q fS <> aS = S $ \t -> let (f,fC) = runS fS t (a,aC) = runS aS t in (f a, fC <> aC)

gatedDLatch :: Signal Bool -> Signal a -> Signal a gatedDLatch enaF dF = S $ \t -> let val = (0.0,undefined) in go val enaF dF t where go :: (Time,a) -> Signal Bool -> Signal a -> Time -> (a, Signal a) go (tP,qP) enaF dF t = if tP <= t then let (ena,enaC) = runS enaF t -- TODO: does stuff break if we only evaluate/force -- dF inside the ena == True branch? (d,dC) = runS dF t in if ena then (d, S (\t -> go (t,d) enaC dC t)) else (qP, S (\t -> go (tP,qP) enaC dC t)) else error "sampling time must flow forward"

fromList :: [(Time,a)] -> Signal a fromList [] = error "empty list" fromList q@((t,a):as) = S $ \tC -> if tC <= t then (a,fromList q) else runS (fromList as) tC

simulate :: (Signal a -> Signal b) -> [(Time,a)] -> [(Time,b)] simulate f samplesIn = go (f (fromList samplesIn)) (fst <$> samplesIn) where go _ [] = error "finite list" go s (t:ts) = let (b,sC) = runS s t bs = go sC ts in ((t,b):bs)

where we could then model and test an accumulator like so:
```haskell
mux = liftA3 (\b t f -> if b then t else f)

accumulate rst ena d =
  let s1 = gatedDLatch ena           (mux rst (pure 0) (liftA2 (+) s2 d))
      s2 = gatedDLatch (not <$> ena) s1
   in s1

unbundle abc = ( fmap (\(a,_,_) -> a) abc
               , fmap (\(_,b,_) -> b) abc
               , fmap (\(_,_,c) -> c) abc)

test = simulate ((\(r,en,d) -> accumulate r en d) . unbundle) $
                zip [0,0.1 ..]
                    ((True,True,0):(False,False,0):go [1..10])
  where
    -- strobe enable for every element
    go [] = []
    go (a:as) = (False,True,a):(False,False,a):go as

and observe the results:

ghci> test
[(0.0,0),(0.1,0),(0.2,1),(0.3,1),(0.4,3),(0.5,3),(0.6,6),(0.7,6),(0.8,10),(0.90000004,10),(1.0,15),(1.1,15),(1.2,21),(1.3000001,21),(1.4,28),(1.5,28),(1.6,36),(1.7,36),(1.8000001,45),(1.9,45),(2.0,55),(2.1000001,55)*** Exception: finite list
CallStack (from HasCallStack):
  error, called at ClocklessPure.hs:57:17 in main:ClocklessPure
ghci> snd <$> test
[0,0,1,1,3,3,6,6,10,10,15,15,21,21,28,28,36,36,45,45,55,55*** Exception: finite list
CallStack (from HasCallStack):
  error, called at ClocklessPure.hs:57:17 in main:ClocklessPure
  1. Abstract away from time as well, and make an API that's basically petri-nets (I haven't really investigated this yet).
  2. Something else?

I'm not a clockless circuit designer, so I don't really have clue what a desirable API is. It would be really helpful if someone experienced in clockless design could:

  1. Create a couple of "mock" descriptions of clockless circuits they would want to write
  2. Give the "semantics" of those circuits in terms of: for these given inputs, it should give these expected outputs
s5bug commented 3 years ago

I'm not experienced, but I'm working through Introduction to Asynchronous Circuit Design by Jens Sparsø, so I don't have much of the knowledge you want, but I can provide some starting points:

Obviously there's the bit D Latch:

bitDlatch (en, d) = (q, qbar)
  where
    q = !(qbar || (en && !d))
    qbar = !(q || (en && d))

This can be composed into a dlatch that holds an arbitrary value, i.e. one that looks like (Bool, a) -> a.

There's also the C Element:

celement (a, b) = y
  where
    y = (a && b) || (y && (a || b))

From these you can build the push-based asynchronous version of the DFF (I haven't made it to Chapter 10 where pull-based circuits are discussed):

echo (lReq, rAck, lData) = (rReq, lAck, rData)
  where
    ready = celement (not rAck, lReq)
    rReq = ready
    lAck = ready
    rData = dlatch (ready, lData)

I'm fairly certain (but by no means entirely certain!) that the intended behavior of this circuit is:

  1. Data is pushed to lData
  2. lReq is set high
  3. The circuit should latch the data in its dlatch, and then set rReq high, and wait for rAck to be set high
  4. Once rAck is set high, it should set lAck high
  5. lReq is allowed to be set low, and this should bring lAck low

And you could chain those together to build a FIFO:

fifo (lReq, rAck, lData) = (rReq, lAck, rData)
  where
    (req1, lAck, data1) = echo (lReq, ack1, lData)
    (req2, ack1, data2) = echo (req1, ack2, data1)
    (rReq, ack2, rData) = echo (req2, rAck, data2)

There's a lot of mutual recursion here, which I assume is very hard to deal with. The interactive simulator I was using for the equivalent circuits in Verilog, DigitalJS, handled the echo circuit just fine, but completely gave up when faced with fifo.