calyxir / calyx

Intermediate Language (IL) for Hardware Accelerator Generators
https://calyxir.org
MIT License
502 stars 51 forks source link

Special Ports and Composable Compilation Passes #127

Closed sgpthomas closed 4 years ago

sgpthomas commented 4 years ago

Inspired by the discussion on PR #124 and issue #126, I think that we need to think more about how "compilation" passes should compose. "Compilation" passes are simply passes that turn control into structure.

Ideally we would only need to write compilation passes for a few base cases and then compose the passes to compile an entire program. The signatures of these base case passes would be approximately:

(seq (enable ..) (enable ..) ...) -> (enable ...)
(par (enable ..) (enable ..) ..) -> (enable ..)
(if (@ cond out) (A B ..) (enable T) (enable F)) -> (enable A B .. T F)
(while (@ cond out) (A B ..) (enable body)) -> (enable A B .. T F)

However, with the way that we are doing things now, transforming control into structure looses some information about how to enable sub-graphs.

I'll illustrate what I mean by looking at an example with if. Consider the following simple Dahlia program:

if (x < 2) {
  y = 3
} else {
  z = 10
}

with Futil control:

(if (@ < out) (x < 2)
  (enable y 3)
  (enable z 10))

and structure: image

This is the base case compilation case for if statements and we can compile it to something like: image

(enable x < 2 3 y 10 z)

What this pass does is connect the output of the comparator to the write-en of all the registers in the true branch and the inverse of the comparator to the write-en of the registers in the false branch. (I'm using registers here, but you can easily extend this to any side-effecting component as long as you have a way of identifying these).

Now consider a more complicated program:

if (x < 2) {
  y = 3
  ---
  y = 4
} else {
  z = 10
}

The Futil control program for this is:

(if (@ < out) (x < 2)
  (seq (enable y 3) (enable y 4))
  (enable z 10))

with structure: image

The first thing to do is turn the true branch of the if statement into a single enable. The control becomes:

(if (@ < out) (x < 2)
  (enable y 3 y 4 st0 st1 mux)
  (enable z 10)))

and the structure: image

But now we can't apply the same if pass that we did before, because we don't know how to 'enable' the (enable y 3 y 4 st0 st1 mux) statement. Ideally, we would want to generate hardware that looks like: image But to do this, we would need a separate pass to compile this kind of if statement. In other words, these compilation passes don't compose.

That was maybe the more subtle of the problems. There's also a problem in if branches that each branch may take a different number of cycles. When we compile the bodies to a single enable we lose information about how to tell if a branch is done. We would have to keep track of this either in a separate data structure or implicitly in the names of generated structure.

My proposal to fix this problem is to introduce semantically meaningful ports and embrace loaning. Concretely, components could optionally declare an input control port, and output control port, and a input clock port. The syntax might look like this:

(define/module name
  ((control) (clock) (port in 32) ...)
  ((control) (port out 32))

The reason to have this as syntax rather than just implicitly defined for everything is that combinational modules may not need any control or clock signals. For example, the std_const component would have the signature:

(define/prim std_const
  ()
  ((port out 32)))

I'm not sure yet if it would make sense to have multiple control inputs/outputs and whether their bitwidth should be specifiable.

Semantically, the input control port is meant to represent the valid wire. Internally to the component, when the input control port is high, it gets a changing clock signal that it can use to drive change. When it sets it's output control port to high, then it's input control port will become low and it will stop receiving clock updates. Externally, setting the input control signal on a subcomponent high will cause the subcomponent to 'run'.

Now we can change the if compilation base case to just wire up the condition output to the control inputs of the true and false branches.

The interesting part is the 'inductive' compilation case (just composing compilation passes). The idea is that a pass would create a loan of all the components it touches, and then inductively compile these components using the special control ports as an interface. Using a lot of made up syntax, the above example would be something like:

(define/component main () ()
  ...
  [-> (@ x out) (@ gt left)]
  [-> (@ 2 out) (@ gt right)]
  [-> (@ 3 out) (@ y in)]
  [-> (@ 4 out) (@ y in)]
  [-> (@ 10 out) (@ z in)]
 (if (@ gt out) (x gt 2)
   (seq (enable y 3)
        (enable y 4))
   (enable z 10)))

is converted to

(define/component main () ()
  ...
 ([-> (@ x out) (@ gt left)]
  [-> (@ 2 out) (@ gt right)]
  [-> (@ 3 out) (@ y in)]
  [-> (@ 4 out) (@ y in)]
  [-> (@ 10 out) (@ z in)]
  [loan true_branch (y 3 4)])
 (if (@ gt out) (x gt 2)
   (enable true_branch)
   (enable z 10)))

(define/component true_branch
 ((control) (clock))
 ((control))
 ((loan y std_reg) (loan 3 std_const) (loan 4 std_const))
 ([ ; same sugbraph structure ])
 (seq (enable y 3)
      (enable y 4)))

And now just use base case compilation passes on both components and because of the control ports, things will just compose.

Idk if this solution is a good idea, but I'm curious to hear people's thoughts.

sgpthomas commented 4 years ago

Another option that came up when I was discussing this with Kenneth is to be even more radical and abolish the notion of components altogether. The idea would be to have a single top level structure graph (and probably top level control too), and then replace define/component with something like define/interface. This construct would not be able to own it's own structure, it would just define an interface over some sub-graph of the structure. Furthermore, it would only define control signals and it could have a normal Futil control program.

I want to emphasize that this is a radical idea, but it may have some potential to clean things up. I'll expand on this more later, but I wanted to put this here while it was fresh in my mind.

tissue3 commented 4 years ago

Is there a chance that sub-graph prevent stop us from merging potential mergeable things? It seems function inlining sometimes leads to potential optimization, but we are doing the opposite way. But inevitably we need creating sub-graph to remove control and create large components.

Besides that, I am still confused at what removeif pass does.

(if (@ < out) (x < 2)
  (enable y 3)
  (enable z 10))

The RemoveIf pass will change it to

(enable x < 2 3 y 10 z)

Without explicit w_en signal created by RemoveIf pass, what is w_en? Is it driven by valid signal from enable? If so, what does it mean by ready? How does enable know a component not enabled, e.g. z is ready?

sgpthomas commented 4 years ago

I'm uncertain about your first point. I agree though that the "outlining" solution feels rather messy.

The write_en wire is fed into the write enable port of registers so that they commit the input value. It comes from the comparator specified in the if statement. However, what it doesn't do is present an interface so that we know how to run the (enable x < 2 3 y 10 z) statement. At the moment, none of our passes do this because (enable ...) doesn't present a particular interface about what it means to enable its subgraph.

Note that just giving every component the same interface doesn't solve this problem because you still don't know how to enable a subgraph.

sgpthomas commented 4 years ago

Group Proposal

A group represents a sub-graph that can be independently run with a single enable and no associated control. They are generated during the compilation to syntactically represent that some control has been compiled into structure. It allows a sub-graph to be turned into a "blackbox" that knows how to be enabled. For now we are using val/ready interfaces for enabling groups.

Proposed Syntax

1) Add group statement to the structure that declares a name and a set of subcomponent/groups to be included:

[group A (x y const1 const2)]

2) Change enable to only take in a single group:

(enable A B C) ; DISALLOWED
(enable group1)

3) groups have a single valid input port and a ready output port.

[group A (const1 const2 reg1 mem2)]
[-> (@ other ready) (@ A valid)]
[-> (@ A valid) (@ x in)]
[-> (@ x out) (@ A ready)]
[-> (@ A ready) (@ next valid)]

Note: Unlike normal ports, you can connect to them from both sides. Because of this we may want to differentiate the syntax.

How compilation works

Suppose that we start with the following program (the instantiations of primitives have been left out for brevity):

(define/component main () ()
  ([group cond (x 2 gt)]
   [group t1 (3 y)]
   [group t2 (4 y)]
   [group f1 (10 z)]

   [-> (@ x out) (@ gt left)]
   [-> (@ 2 out) (@ gt right)]
   [-> (@ 3 out) (@ y in)]
   [-> (@ 4 out) (@ y in)]
   [-> (@ 10 out) (@ z in)])

  (seq
   (if (@ gt out) cond
     (seq
      (enable t1)
      (enable t2))
     (enable f1))))

The first thing we do is make a new group for the true branch of if so that we can compile away the seq:

(define/component main () ()
  ([group cond (x 2 gt)]
   [group t1 (3 y)]
   [group t2 (4 y)]
   [group f1 (10 z)]

   [-> (@ x out) (@ gt left)]
   [-> (@ 2 out) (@ gt right)]
   [-> (@ 3 out) (@ y in)]
   [-> (@ 4 out) (@ y in)]
   [-> (@ 10 out) (@ z in)]

   [group seq_t1_t2 (t1 t2)])
  (seq
   (if (@ gt out) cond
     (enable seq_t1_t2)
     (enable f1))))

Now we implement the control for this group that implements the seq control:

(define/component main () ()
  ([group cond (x 2 gt)]
   [group t1 (3 y)]
   [group t2 (4 y)]
   [group f1 (10 z)]

   [-> (@ x out) (@ gt left)]
   [-> (@ 2 out) (@ gt right)]
   [-> (@ 3 out) (@ y in)]
   [-> (@ 4 out) (@ y in)]
   [-> (@ 10 out) (@ z in)]

   [group seq_t1_t2 (t1 t2)]
   [new-std fsm1 (fsm-state)]
   [new-std fsm2 (fsm-state)]
   [-> (@ seq_t1_t2 valid) (@ fsm1 in)]
   [-> (@ fsm1 out) (@ fsm2 in)]
   [-> (@ fsm2 out) (@ seq_t1_t2 ready)]
   [-> (@ fsm1 out) (@ t1 valid)]
   [-> (@ fsm2 out) (@ t2 valid)]
   [-> (@ fsm out) (@ seq_t1_t2 ready)])
  (seq
   (if (@ gt out) cond
     (enable seq_t1_t2)
     (enable f1))))

Then we can turn the if into a single group:

(define/component main () ()
  ([group cond (x 2 gt)]
   [group t1 (3 y)]
   [group t2 (4 y)]
   [group f1 (10 z)]

   [-> (@ x out) (@ gt left)]
   [-> (@ 2 out) (@ gt right)]
   [-> (@ 3 out) (@ y in)]
   [-> (@ 4 out) (@ y in)]
   [-> (@ 10 out) (@ z in)]

   [group seq_t1_t2 (t1 t2)]
   [new-std fsm1 (fsm-state)]
   [new-std fsm2 (fsm-state)]
   [-> (@ seq_t1_t2 valid) (@ fsm1 in)]
   [-> (@ fsm1 out) (@ fsm2 in)]
   [-> (@ fsm2 out) (@ seq_t1_t2 ready)]
   [-> (@ fsm1 out) (@ t1 valid)]
   [-> (@ fsm2 out) (@ t2 valid)]

   [group if (seq_t1_t2 cond f1)])
  (enable if))

and then wire up the if control:

(define/component main () ()
  ([group cond (x 2 gt)]
   [group t1 (3 y)]
   [group t2 (4 y)]
   [group f1 (10 z)]

   [-> (@ x out) (@ gt left)]
   [-> (@ 2 out) (@ gt right)]
   [-> (@ 3 out) (@ y in)]
   [-> (@ 4 out) (@ y in)]
   [-> (@ 10 out) (@ z in)]

   [group seq_t1_t2 (t1 t2)]
   [new-std fsm1 (fsm-state)]
   [new-std fsm2 (fsm-state)]
   [-> (@ seq_t1_t2 valid) (@ fsm1 in)]
   [-> (@ fsm1 out) (@ fsm2 in)]
   [-> (@ fsm2 out) (@ seq_t1_t2 ready)]
   [-> (@ fsm1 out) (@ t1 valid)]
   [-> (@ fsm2 out) (@ t2 valid)]

   [group if (seq_t1_t2 cond f1)]
   [new-std if_neg (std_not)]
   [new-std if_or_rdy (std_or)]
   [-> (@ if valid) (@ cond valid)]
   [-> (@ cond valid) (@ seq_t1_t2 valid)]
   [-> (@ cond valid) (@ std_not in)]
   [-> (@ std_not out) (@ f1 valid)]
   [-> (@ std_not out) (@ f1 valid)]
   [-> (@ seq_t1_t2 ready) (@ if_or_rdy left)]
   [-> (@ f1 ready) (@ if_or_rdy right)]
   [-> (@ if_or_rdy out) (@ if rdy)])
  (enable if))

Things to figure out

Possible extensions

rachitnigam commented 4 years ago

Thanks for writing this up! I think this clearly answers the question "What does high-level control turn into during compilation through FuTIL?"

A bunch of questions:

Q. From the description of the compilation, I am imagining that there is a seq-removal pass and an if-removal pass. What happens when seq-removal encounters this:

(seq 
  (if c1 (seq a1 a2) b1)
  (if c2 (seq a3 a4) b2))

The seqs inside the conditional branches can be easily transformed into groups. What do we do when a seq contains as-yet-unremoved control statements inside it?

More generally, it seems that removal of control structures recursively relies on all the control structures within it to be removed already.

Q. Do we care about checking that groups with shared components are never enabled together?

(...
   [group t1 (3 y)]
   [group t2 (4 y)]
...)
(par (enable t1) (enable t2))

If we don't explicitly check is, what is the correctness condition. "If you give us a program where the control never invokes shared components in parallel, FuTIL compilation is correct?"

Q. I'm not sure what is going on with the FSM states here:

   [group seq_t1_t2 (t1 t2)]
   [new-std fsm1 (fsm-state)]
   [new-std fsm2 (fsm-state)]
   [-> (@ seq_t1_t2 valid) (@ fsm1 in)]
   [-> (@ fsm1 out) (@ fsm2 in)]
   [-> (@ fsm2 out) (@ seq_t1_t2 ready)]
   [-> (@ fsm1 out) (@ t1 valid)]
   [-> (@ fsm2 out) (@ t2 valid)]

It seems like as soon as you drive fsm1.in, both t1.valid and t2.valid will be driven causing them to execute in parallel.

More generally, it seems you need some mechanism for a statement like x := 1 to signal that it is done writing and that it is safe to execute the next step.

sampsyo commented 4 years ago

This is super awesome! I agree that this seems like a promising path, and I agree with @sgpthomas's questions. For one in particular: Can we optionally express/use static latency for groups? My gut says yes; this could be the right way to incrementally build up information about static latency for parts of a component.

And also with @rachitnigam's:

More generally, it seems that removal of control structures recursively relies on all the control structures within it to be removed already.

Yeah, it does seem like that's the case. In general, it seems like we'll want to do these "control-to-structure" transformations "from the inside out," i.e., iteratively at the leaves of the control AST. (A consequence is that it won't do to have an "if removal" pass that runs once and a "seq removal" pass that runs once. Instead, it will be necessary to iterate the control collapsing passes to a fixed point, i.e., trivial control.)

Do we care about checking that groups with shared components are never enabled together?

Indeed, it seems like we want to classify that as a poorly behaved program. Going back to some classic Futil roots, I think the practical thing to do here, as a first cut, is to instrument the interpreter to yield dynamic errors when this happens—and to allow passes and backends to generate wrong code when it does.

That's under the hypothesis that static checking of this "simultaneous enabling" property will be undecidable in general, i.e., necessarily conservative in practice. So the next best thing is a precise dynamic check coupled with a sometime-down-the-road conservative static check.