Par semantics and describing PCFGs algebraically

Background

The previously proposed work discussed building techniques for getting a CFG representation of our data flow to be able to apply static analysis on these and prevent data races:

1225
1366

From my current understanding, Calyx is still using (or I should say, planning on using) the DRF0 semantics presented in #932, which led to us needing more constructs to enforce synchronization when needed. On #921, it's discussed that using something akin to "Lockstep" semantics would cause implicit-uncontrollable freedom. While we have a latency-insensitive version of explicit synchronization (@sync(n)), we still require a static-timing version.

I want to address a few things, which directly target the technical debt in #1825:

Formalize Explicit Synchronization semantics as mentioned over at #921 using Partially Ordered Multiset (pomset) semantics in Kleene Algebras.
- While this might sound like word salad at first, there are a lot of nice representations for static data-flow analysis which describe CFGs using KA (like here, here, and also here) --- hence an algebraic program analysis.
- Pomset semantics are useful for achieving the P from (P)arallelCFG. There is some work on using Synchronous Kleene Algebras (SKA) for targetting Network Flow and more on Concurrent KA.
With these semantics, I think we can get a trickle-down effect towards tooling @sync.
The actual implementation of these semantics for generating the PCFGs isn't necessarily something I'm expecting to target here, but overall I want a formal write-up that extends DRF0.

My main motivation for this proposal however is that there is some interesting ongoing work with HardKAT. Ideally, this proposal would at the very least aid with the development of a front-end for compiling from Calyx to HardKAT (which shows to be particularly difficult because of the lack of formal semantics in regards to parallelism). I believe some people at BlueSpec are working on their front-end for HardKAT and it does require extending their existing formal semantics.

Technique

My current approach relies on the SKA paper, although more general pomset concurrent KA frameworks could be considered. There has to exist a mapping between our constructs and SKA operators, particularly the synchronous product operator. What does SKA gain us?

Defines a loose synchronous product for non-deterministic execution order
Soundness and completeness of latency-insensitive semantics

I won't go into the very-technical details with regards to our current issues with par, but I will assume a small-naive par construct for the sake of this example. Let's take a very small subset of Calyx --- PCalyx --- that trivially holds an execution judgment for PCalyx programs:

<p, sigma> -> <p', sigma'>
------------------------------------------------------------ step
Program p in state sigma steps to p' with new state sigma

We take the following rules for the par construct:

 <p1, sigma> -> <p'1, sigma'>
 -------------------------------------------------- par-left
 <par{p1,p2}, sigma> -> <par{p'1,p2}, sigma'>

 <p2, sigma> -> <p'2, sigma'>
 -------------------------------------------------- par-right
 <par{p1,p2}, sigma> -> <par{p1,p'2}, sigma'>

 p1 = done     p2 = done
--------------------------------------------- par-done
 <par{p1, p2}, sigma> -> <done, sigma>

For groups $g$, we can define:

 g is a group
--------------------------------------------- group-start
<g, sigma> -> <gbody, sigma[g.go -> 1]>

 g.done = 1 in sigma
------------------------- group-done
<gbody, sigma> -> <done, sigma>

Moreover, we can handle latency-insensitive tasks by introducing a delay:

 true
----------------------------------- delay
<delta, sigma> -> <done, sigma>

In summary, we have par-left and par-right which allow non-deterministic interleaving, group-start/group-done are the activation and completion of groups respectively, and a delay.

With this toy PCalyx subset, we can illustrate what I mean by using KA. Let's take the following snippet:

group g1 {
  x.in = a.out;
  g1[done] = x.done;
}

group g2 {
  y.in = b.out;
  g2[done] = y.done;
}

control {
  par { g1; g2 }
}

We gain the following representation:

$g1 := (x.in \leftarrow a.out) \cdot (g1[done]\leftarrow x.done)$ $g2 := (y.in \leftarrow b.out) \cdot (g2[done] \leftarrow y.done)$ $par(g1, g2) := g1 \times g2$.

Here, $\times$ is the synchronous product from SKA. We then gain the following properties:

$par(g1, g2) = par(g2, g1)$ (commutativity) $par(g1, par(g2, g3)) = par(par(g1, g2), g3)$ (associativity)

And to handle latency-insensitivity, we have:

$g1 := (x.in \leftarrow a.out) \cdot \delta \cdot (g1[done] \leftarrow x.done)$ $g2 := (y.in \leftarrow b.out) \cdot \delta \cdot (g2[done] \leftarrow y.done)$

For it to fully make sense with SKAT we would need a SKAT-Par formalization $(K, B, +, \cdot, \times, *, 0, 1, \neg)$ which consists of a Kleene algebra, a boolean algebra, and the synchronous product operator $K \times K \to K$. Here we would also introduce the formalization of the delay-operator.

Finally, the core idea here is to gain parallel composition without a global clock (hence the DRF0 extension semantics). We can use pomset semantics for defining an event partial order on a local time domain for PCalyx as follows:

Local Time Domain: For each component $c$ in a PCalyx program, we associate a local time domain $T_c$, which is a totally ordered set representing the sequence of events in that component.
Event Partial Order: We define a partial order relation $\prec$ on the set of all events accross all components. For events $e_1, e_2$:
- If $e_1$ and $e_2$ are in the same local time domain $T_c$, then $e_1 \prec e_2$ iff $e_1$ occurs before $e_2$ in $T_c$.
- If $e_1$ and $e_2$ are in different time domains, $e_1 \prec e_2$ iff there exists a chain of casually related events connecting $e_1$ to $e_2$.
Parallel Composition: For PCalyx components $a$ and $b$, their parallel composition $a \times b$ is defined as the set of all possible interleavings of events from $a$ and $b$ that respect the partial order $\prec$.

This would require some KA proofs, but the final application to PCalyx should look something like this:

$g := g.go \cdot \delta \cdot g_{body} \cdot \delta \cdot g.done$ $g_1 \times g_2 = (g1.go \cdot \delta* \cdot g{1body} \cdot \delta \cdot done) \times (g_2.go \cdot \delta \cdot g_{2body} \cdot \delta* \cdot g_2.done)$

Everything thus far has just been a framework for par semantics. However, we would also now like to get PCFGs in this context. If we have a set of semantics that imply data-race freedom, we also gain a target for analyzing this.

Discussion

As I mentioned previously, this is mostly an attempt to formalize some wanted semantics that define the core of PCFGs. Of course, I've just outlined a skeleton of how the process for formalization would look like, but I do think there would be a lot of trickle-down actions this set of semantics could contribute to. I do have some questions:

There have been lots of updates on the implementation par and the ideas behind how to represent the CFGs, do any of these directly affect any motivations behind needing to formalize par and as a consequence PCFGs algebraically?
What are some open (github) issues that address problems related to structures used within par blocks or @sync annotations?

Thoughts?

calyxir / calyx