christiaanb commented 4 years ago

NB: this text is work in progress Inspired by TL-Verilog (https://www.redwoodeda.com/tl-verilog, https://www.makerchip.com/, http://www.tl-x.org/), I want to propose an extension to Clash for pipelined designs. It will need GHC source plugins (perhaps combined with contraint-solver plugins) to work.

Inital example

The following code:

pyth = pipeline \valid a b -> do
  ?valid
    @1
       let aSq = a * a
           bSq = b * b
    @2
       let cSq = aSq + bSq
    @3 
       let c   = sqrt cSq
       bundle (valid, c)

which would desugar to:

pyth :: 
  (Num a, HiddenClock clk) => 
  DSignal dom n Bool -> 
  DSignal dom n a ->
  DSignal dom n a ->
  DSignal dom (n+3) (Bool, a)
pyth valid a b = let
    -- 1
    aD1 = dffE valid a
    bD1 = dffE valid b
    validD1 = dff valid
    aSq = (*) <$> aD1 <*> aD1
    bSq = (*) <$> bD1 <*> bD1
    -- 2
    aSqD1 = dffE validD1 aSq
    bSqD1 = dffE vallidD1 bSq
    validD2 = dff validD1
    cSq = (+) <$> aSqD1 <*> bSqD1
    -- 3
    cSqD1 = dffE validD2 cSq
    validD3 = dff validD2
    c = sqrt <$> cSqD1
 in bundle (validD3, c)

where

-- | Simple dflipflop, not reset, no enable
dff :: 
  HiddenClock dom => 
  DSignal dom d a -> 
  DSignal dom (d+1) a

-- | dflipflop with enable; no reset
dffE :: 
  HiddenClock dom => 
  DSignal dom d Bool ->
  DSignal dom d a ->
  DSignal dom (d+1) a

Using a pipelined circuit

Assume that:

sqrtS :: 
  (HiddenClock dom, Num a) =>
  DSignal dom d a ->
  DSignal dom d Bool ->
  DSignal dom (d+2) (Bool, a)

we would then write:

pyth = pipeline \valid a b -> do
  ?valid
    @1
       let aSq = a * a
           bSq = b * b
    @2
       let cSq = aSq + bSq
    @3 
       c <- sqrtS valid cSq
       pure c

which would desugar do:

pyth valid a b = let
    -- 1
    aD1 = dffE valid a
    bD1 = dffE valid b
    validD1 = dff valid
    aSq = (*) <$> aD1 <*> aD1
    bSq = (*) <$> bD1 <*> bD1
    -- 2
    aSqD1 = dffE validD1 aSq
    bSqD1 = dffE vallidD1 bSq
    validD2 = dff validD1
    cSq = (+) <$> aSqD1 <*> bSqD1
    -- 3
    cSqD1 = dffE validD2 cSq
    validD3 = dff validD2
    c = sqrtS validD3 cSqD1
 in c

Using values from a particular pipeline stage

looking back (relative offset)

pipeline \a b -> do
@1
  let x = f a b
@3
  let y = g (x `behind` 1) a

would desugar to:

\a b -> let
-- 1
aD1 = dff a
bD1 = dff b
x = f <$> aD1 <*> bD1
-- 2
aD2 = dff aD1
xD1 = dff x
-- 3
aD3 = dff aD2
y = g <$> fakeDelay d1 xD1 <*> aD3
in y

where

-- | pretend a signal has more delay than it actually has
fakeDelay :: 
  SNat n ->
  DSignal dom d a -> 
  DSignal dom (d+n) a

looking ahead (relative offset)

pipeline \a -> do
@1
  let x = f a (x `ahead` 1)

would desugar to:

\a -> let
-- 1
aD1 = dff a
x = f <$> aD1 <*> antiDelay d1 xD1
-- 2
xD1 = dff x
in x

where

-- | access a delayed signal in the present
fakeDelay :: 
  SNat n ->
  DSignal dom (d+n) a -> 
  DSignal dom n a

looking at an exact stage

pipeline \a b -> do
@1
  let x = f a b
@3
  let y = g (x `exactlyAt` 1) a

would desugar to:

\a b -> let
-- 1
aD1 = dff a
bD1 = dff b
x = f <$> aD1 <*> bD1
-- 2
aD2 = dff aD1
-- 3
aD3 = dff aD2
y = g <$> fakeDelay d2 x <*> aD3
in y

and

pipeline \a -> do
@1
  let x = f a (x `exactlyAt` 3)

would desugar to:

\ a -> let
-- 1
aD1 = dff a
x = f <$> aD1 <*> antiDelay d2 xD2
-- 2
xD1 = dff x
-- 3
xD2 = dff xD1
in x

Using choice constructs (case-expressions, guards, etc.)

pileline \a -> do
@1
  let x = case a of 
               True -> 3
               False -> 4

desugars to:

\a -> let
-- 1
aD1 = dff a
xF = \a -> case a of {True -> 3; False -> 4}
x = xF <$> aD1
in x

e.g. we abstract over all pipelined values to create a pure function, and then apply that function applicatively over the pipelined values.

Using non-pipelined arguments

f a = pipeline \b -> do
@1
  let x = g a b

desugars to:

f a = \b -> let
-- 1
bD1 = dff b
x = f <$> pure a <*> bD1

i.e. non-pipelined arguments are to be single values, not streams of values. So you get a type error if a non-pipelined arguments is of type Signal or DSignal

martijnbastiaan commented 4 years ago

Hmm.. I think this does two things:

Introduce idiom notation (something like this)
Introduce pipeline notation

I think we could benefit from splitting this up. If we have (1) I don't think we need extra "magical" syntax. Something like this might work:

pipeline valid $ do 
  (aSq, bSq) <- stage d1 $ do
    pure (a * a, b * b)

  cSq <- stage d2 $ do
    pure (aSq + bSq)

  stage d3 $
    sqrtS valid cSq

As a bonus, (1) could be reused in non-piplelined code too.

christiaanb commented 4 years ago

We need the magical syntax because the amount of delay is calculated as the difference between declaration and use site. In the example so far, the difference has always been one, but it can be arbitrary.

martijnbastiaan commented 4 years ago

In the example I posted I imagined the idiom notation to take care of auto-delaying signals where necessary, and the 'stage' function to force a specific delay for their "return values".

christiaanb commented 4 years ago

It has to be taken care of at the use site, because a value might be used at 2 places, where you want to share pipelining registers. e.g. in

pipeline \a -> do
  @1
      let z = f a
  @3
      let y = g a z
  @4
      let x = h z y

you want:

three pipeline registers for a, where the use in f a branches off from the first stage. and the use in g a z uses the last stage.
three pipeline registers for z, where the use in g a z branches off from the first stage, and the use in h z y uses the last stage.
one pipeline register for y, for the use in h z y

christiaanb commented 4 years ago

The way I imagined the algorithm, is that for every stage you get a delayed versions of the values declared in the previous stage: this includes delayed values in the stage before that. Then using some tying-the-knot recursion, you can prune the unused delayed values in the same traversal.

martijnbastiaan commented 4 years ago

It has to be taken care of at the use site, because a value might be used at 2 places, where you want to share pipelining registers

Got it, but I feel like we could solve this on a more fundamental level by improving CSE.

At any rate, I'd be hesitant to put this in clash-prelude as it feels both highly specific and subject to change on future insights / compiler improvements. Did you imagine putting this in a separate package?

christiaanb commented 4 years ago

Also, I'm not a big fan of idiom brackets, because it changes the meaning of parenthesis inside the bracket. e.g. ([(f a b) c d]) means something else than ([f a b c d]) because the the former desugars to fmap (f a b) <*> c <*> d and the latter to fmap f <*> a <*> b <*> c <*> d

christiaanb commented 4 years ago

I imaged it living in a separate package, definitely not in clash-prelude. Also, the proposal is far from finished.

martijnbastiaan commented 4 years ago

Ah ok, in that case I'm perfectly fine with the proposal :+1:

paddytheplaster commented 4 years ago

Hi Christiaan,

Thanks for proposing this extension, which I think is needed.

I don't understand the full ins and outs yet because it's not clear which notation is standard non-standard Haskell (a.k.a ghc extensons) and which notation is non-standard non-standard notation.

You're using @x to indicate levels in the pipeline. This seems too complex to me. Can't we use types to denote the values at the different levels? E.g. (d0,x) would be an input value, (d1,y) a value at level 1, and so on. We could then define operations on such values, such that e.g. f <$> (i1,a) (i2,b) evaluates to (d1 + max 1i i2,f a b).

You could then use these operations as building blocks to define the pipeline. Just a thought.

Regards,

Paddy

christiaanb commented 4 years ago

annotating every binder seems quite cumbersome, I think linking our approaches: My:

pipeline \a b p q r s -> do
@3
  let x = f a b
  let y = g p q
  let z = h r s
  bundle (x,y,z)

would I think translate to your:

\(d0,a) (d0,b) (d0,p) (d0,q) (d0,r) (d0,s) ->
  let (d3,x) = f a b
      (d3,y) = g p q
      (d3,z) = h r s
  in  (d3,(x,y,z))

or annotating at the type-level:

\(a :: d0) (b :: d0) (p :: d0) (q :: d0) (r :: d0) (s :: d0) ->
  let (x :: d3) = f a b
      (y :: d3) = g p q
      (z :: d3) = h r s
  in (x,y,z) :: d3

The thing is, if you do annotation at the type-level, then you need to annotate every binder. Whereas "my" approach using scoping to "annotate" all the relevant bindings. Also, the point of my proposal was not to calculate the delay of a computation, it is to automatically add pipeline registers to make sure everything is synchronised.

kejace commented 4 years ago

It has to be taken care of at the use site, because a value might be used at 2 places, where you want to share pipelining registers

I'm wondering if the QualifiedDo notation for an Indexed Monad could help achieve the same thing, and thereby alleviating @martijnbastiaan concerns?

clash-lang / clash-compiler

TL-Clash: pipelining notation for Clash #1336

Inital example

Using a pipelined circuit

Using values from a particular pipeline stage

looking back (relative offset)

looking ahead (relative offset)

looking at an exact stage

Using choice constructs (case-expressions, guards, etc.)

Using non-pipelined arguments