calyxir / calyx

Intermediate Language (IL) for Hardware Accelerator Generators
https://calyxir.org
MIT License
485 stars 48 forks source link

Par semantics and describing PCFGs algebraically #2278

Open janpaulpl opened 3 weeks ago

janpaulpl commented 3 weeks ago

Background

The previously proposed work discussed building techniques for getting a CFG representation of our data flow to be able to apply static analysis on these and prevent data races:

From my current understanding, Calyx is still using (or I should say, planning on using) the DRF0 semantics presented in #932, which led to us needing more constructs to enforce synchronization when needed. On #921, it's discussed that using something akin to "Lockstep" semantics would cause implicit-uncontrollable freedom. While we have a latency-insensitive version of explicit synchronization (@sync(n)), we still require a static-timing version.

I want to address a few things, which directly target the technical debt in #1825:

My main motivation for this proposal however is that there is some interesting ongoing work with HardKAT. Ideally, this proposal would at the very least aid with the development of a front-end for compiling from Calyx to HardKAT (which shows to be particularly difficult because of the lack of formal semantics in regards to parallelism). I believe some people at BlueSpec are working on their front-end for HardKAT and it does require extending their existing formal semantics.

Technique

My current approach relies on the SKA paper, although more general pomset concurrent KA frameworks could be considered. There has to exist a mapping between our constructs and SKA operators, particularly the synchronous product operator. What does SKA gain us?

I won't go into the very-technical details with regards to our current issues with par, but I will assume a small-naive par construct for the sake of this example. Let's take a very small subset of Calyx --- PCalyx --- that trivially holds an execution judgment for PCalyx programs:

<p, sigma> -> <p', sigma'>
------------------------------------------------------------ step
Program p in state sigma steps to p' with new state sigma

We take the following rules for the par construct:

 <p1, sigma> -> <p'1, sigma'>
 -------------------------------------------------- par-left
 <par{p1,p2}, sigma> -> <par{p'1,p2}, sigma'>

 <p2, sigma> -> <p'2, sigma'>
 -------------------------------------------------- par-right
 <par{p1,p2}, sigma> -> <par{p1,p'2}, sigma'>

 p1 = done     p2 = done
--------------------------------------------- par-done
 <par{p1, p2}, sigma> -> <done, sigma>

For groups $g$, we can define:

 g is a group
--------------------------------------------- group-start
<g, sigma> -> <gbody, sigma[g.go -> 1]>

 g.done = 1 in sigma
------------------------- group-done
<gbody, sigma> -> <done, sigma>

Moreover, we can handle latency-insensitive tasks by introducing a delay:

 true
----------------------------------- delay
<delta, sigma> -> <done, sigma>

In summary, we have par-left and par-right which allow non-deterministic interleaving, group-start/group-done are the activation and completion of groups respectively, and a delay.

With this toy PCalyx subset, we can illustrate what I mean by using KA. Let's take the following snippet:

group g1 {
  x.in = a.out;
  g1[done] = x.done;
}

group g2 {
  y.in = b.out;
  g2[done] = y.done;
}

control {
  par { g1; g2 }
}

We gain the following representation:

$g1 := (x.in \leftarrow a.out) \cdot (g1[done]\leftarrow x.done)$ $g2 := (y.in \leftarrow b.out) \cdot (g2[done] \leftarrow y.done)$ $par(g1, g2) := g1 \times g2$.

Here, $\times$ is the synchronous product from SKA. We then gain the following properties:

$par(g1, g2) = par(g2, g1)$ (commutativity) $par(g1, par(g2, g3)) = par(par(g1, g2), g3)$ (associativity)

And to handle latency-insensitivity, we have:

$g1 := (x.in \leftarrow a.out) \cdot \delta \cdot (g1[done] \leftarrow x.done)$ $g2 := (y.in \leftarrow b.out) \cdot \delta \cdot (g2[done] \leftarrow y.done)$

For it to fully make sense with SKAT we would need a SKAT-Par formalization $(K, B, +, \cdot, \times, *, 0, 1, \neg)$ which consists of a Kleene algebra, a boolean algebra, and the synchronous product operator $K \times K \to K$. Here we would also introduce the formalization of the delay-operator.

Finally, the core idea here is to gain parallel composition without a global clock (hence the DRF0 extension semantics). We can use pomset semantics for defining an event partial order on a local time domain for PCalyx as follows:

This would require some KA proofs, but the final application to PCalyx should look something like this:

$g := g.go \cdot \delta \cdot g_{body} \cdot \delta \cdot g.done$ $g_1 \times g_2 = (g1.go \cdot \delta* \cdot g{1body} \cdot \delta \cdot done) \times (g_2.go \cdot \delta \cdot g_{2body} \cdot \delta* \cdot g_2.done)$

Everything thus far has just been a framework for par semantics. However, we would also now like to get PCFGs in this context. If we have a set of semantics that imply data-race freedom, we also gain a target for analyzing this.

Discussion

As I mentioned previously, this is mostly an attempt to formalize some wanted semantics that define the core of PCFGs. Of course, I've just outlined a skeleton of how the process for formalization would look like, but I do think there would be a lot of trickle-down actions this set of semantics could contribute to. I do have some questions:

Thoughts?

janpaulpl commented 2 weeks ago

Here is a more in-depth proposal I just finished working on.