Closed rachitnigam closed 1 year ago
Hell yes; this seems great. I really like the introduction of the explicit "relative time" bubble within each group; it is a great way to reflect the philosophy of group isolation while transposing it into a statically-timed domain.
To state something explicitly that is implicit in the above… since every static group has a line like this:
group[done] = %N;
…you can easily get the group's latency (what we currently call @static(N)
): it's N, of course. :smiley:
One other obvious thing to make explicit: the time instant %4
is sugar for the time range %[4:4]
.
On static repeat n
loops: maybe this was what you were getting at already, but it seems like we'd want to copy-n-paste the body's assignment n
times? That is, keeping it active for longer doesn't seem the same as repeating its effects multiple times.
This is really interesting. I think it will certainly help accurately and precisely capture the kinds of pipelines we've Calyx for in CIRCT!
One question that wasn't immediately obvious:
The proposed <lhs> = %[n:m]
construct says to assert <lhs>
from relative cycle n
through m
, correct?
I was initially confused, because normally the right hand side of an assign can be constants, ports, etc. (I forget the exact rules). But this is a pretty specific case (a constant 1 at certain time steps).
Is the idea this special case is the only limited way this can be used? If so I guess there is a rule that the <lhs>
must be a single bit? Are there other restrictions?
Also, what is the value of <lhs>
outside the timesteps n:m
? Is it 0
/X
? I guess if it is just this specific case, the compiler knows it is 0
outside the time steps where it is 1
.
Good questions, @mikeurbach. @rachitnigam is of course the authority here, but I can give you my gut reactions and get corrected down the line:
%[n:m]
construct would be a new kind of thing that can go on the RHS of assignments, adding to constants and ports. It is the new invention here to cope with static timing.%[n:m]
expressions would have the type 1
, which means they can only be assigned into 1-bit ports.)Thanks for the explanation @sampsyo, that makes sense to me. The part about the type of %[n:m]
expressions really clears it up. Looking forward to seeing where this goes, and getting CIRCT to support the new representation so we can target it from our all-static flow.
Misc syntax note from the meeting. The done assignment could possibly be folded into the declaration rather than the assignment syntax
static group foo { // Starts at time `t`
...
foo[done] = %1; // Asserted at `t+1`
}
becomes something like (syntax entirely made up)
static group foo<1> { // Starts at time `t`
...
}
Indeed; there is something nice about @EclecticGriffin's proposed syntax (or, like, static(1) group foo
, which is very similar to our current annotation system) because it doesn't imply that there is an actual signal required to tell the group to stop executing (as there is in a "normal" group).
Two minor things struck me after our synchronous conversation yesterday:
v1.0 of this system will allow nesting static "islands" within dynamic control programs, which is definitely the most useful thing (it's an enactment of the "globally asynchronous, locally synchronous" (GALS) principle). We don't allow the other way around, for obvious reasons. But @andrewb1999's questions made me think about whether we could, in an extension, do something to allow the "backward" nesting.
Here is a very silly strawperson construct to achieve this. Introduce a limit
control construct:
static seq {
my_static_group;
static limit(25) {
my_dynamic_group;
}
my_other_static_group;
}
Here, my_dynamic_group
can take any amount of time by itself. Wrapping it in limit(25)
makes it take a deterministic 25 cycles. It does this by just aborting the execution altogether if it takes any longer than 25 cycles, and idling if it takes less than 25 cycles. Sort of a brute-force thing, but you can kinda imagine semi-contrived situations where you would want this: for example, when you know (but can't prove to the compiler) that accessing a memory takes either 3 or 25 cycles (on a cache hit or a cache miss).
while
Around Static BodyA thread that comes up often, again raised by @andrewb1999, is how we waste a cycle in traditional Calyx while
loops:
while cond {
body;
}
If body
is a dynamic group that takes 5 cycles, then the overall loop takes 6 cycles per iteration. We waste one cycle to "realize" that the body has finished executing. This seems sad.
It should be a design goal for this proposal that we don't waste a cycle, i.e., the same loop takes only 5 cycles per iteration, when body
is a static group. (Of course, while
is always dynamic—the only kind of static loop that makes sense is a repeat
loop.) This goal implies that we cannot compile the loop by first "wrapping" body
in a dynamic interface and then compiling as normal (which is the normal/universal approach to compiling dynamic islands nested within dynamic control). Instead, we would want our dynamic control compilation pass to have a special case for while
loops that wrap static bodies.
@sampsyo one note on a previous comment:
On static repeat n loops: maybe this was what you were getting at already, but it seems like we'd want to copy-n-paste the body's assignment n times? That is, keeping it active for longer doesn't seem the same as repeating its effects multiple times.
Not sure if we want to duplicate the body because that tends to blow up the FSM by a lot. The TDST implement (#1338) is explicitly designed to not duplicate the body for this reason
Ah, yes, good point. Just copy-and-paste also seems wrong for that reason! At least for how to actually compile stuff; I hope it works OK as a trivial/inefficient way to compile it (i.e., a "reference semantics").
Stepping back a little to the thing I was reacting to originally: this proposed compilation for repeat n { body }
still looks a little weird to me:
static group compiled {
body[go] = %[0:n*body's delay];
compiled[done] = %(n*body's delay);
}
Partly because I'm not sure exactly what assigning to body[go]
really does when body
is a static group, and partly because it doesn't say much about how to repeatedly re-start the relative clock for body
.
To state the perhaps-obvious, the way we'd actually want to compile this is to create a counter register. Call it ctr
, and it has ceil(log(n)) bits. This is separate from the "relative clock" FSM register for body
—call that fsm
. Whenever fsm
hits body
's delay, we increment ctr
and reset fsm
back to 0. When ctr
hits n
, the loop is done.
The trouble is that this compilation result seems hard to express as Calyx code. (Whereas the trivial copy-n-paste approach is of course very easy to express.)
This occurred to me today during the CIRCT ODM, but have you looked at TL-X? CIRCT hosted a talk early last year. One thing that stuck out then is the "alignment" operator, which seems spiritually similar to the %[n:m]
operator here. The syntax in TL-X has changed through revisions, but it's the >>
and <<
in Section 16.4 here. I think the construct in TL-X is a more general way to access any value across pipeline stages, but seems spiritually similar in the notion of relative time offset. I guess I'm curious if you can draw any inspiration from TL-X as you are thinking about IR and languages for pipelining.
Thanks for the pointer; I had not seen this particular feature in TL-*!
This has been implemented in spirit.
This proposal follows up on https://github.com/cucapra/calyx/discussions/1334 to add static groups to the language to enable reasoning about timing. The core abstraction is using a "relative clock" to trigger actions. It additionally deprecates the
@static
syntax in favor of astatic
keyword.Static groups
A static group is declared using the new
static
keyword:The
%n
syntax is a relative clock. The%0
is the current cycle,%1
is the next cycle, and so on. The%n
syntax is only allowed in thestatic
group. The well-formedness constraint for a static group is that its done condition must be a relative clock and all actions within are assumed to complete before the done condition is asserted.Chaining operations is straightforward:
This group looks suspiciously similar to a
seq
of two groups that write values into the registersr
andd
. This is intentional.Static Control
Static control operators simply use the
static
keyword in front of the standard control operator. The well-formedness condition requires that they only contain other static control programs within them. Group enable of a static group is the smallest static control program. Furthermore, they obey all the constraints and guarantees from https://github.com/cucapra/calyx/discussions/1331. The composition principle for static control programs is that they can be compiled into static groups.Sequences
The semantics of static control operators are given in terms of a relative clock, i.e.,
one
starts at%0
andtwo
starts at%0 + %(one's delay)
. This means we can compile the static control program into a static group:In general, the compiled group looks like:
The special syntax
%[n:m]
is a signal that is asserted betweenn
andm
cycles. During compilation,one's delay
etc. are going to be concrete numbers.Parallel
Semantics is provided the following group:
In general the compiled group looks like:
Conditional
Conditionals execute their branches in the same cycle when they begin:
A couple of things are interesting about this encoding:
max
of the two branches which implicitly encodes the fact that we balanceif
branches before compiling them.@static if
in https://github.com/cucapra/calyx/pull/1338 which allocates different states for each branch and transitions to one or the other path based on the condition. Perhaps this distinction doesn't matter?Loops
Static loops need to necessaily be bounded. I'm going to define a new control operator
repeat n
to reflect the fact that static loop without a@bound
is meaningless. Maybe we want to eventually actually define this as a control operator (in which case we'd need to handle the case where the body is not static):same as:
This probably gets compiled to:
I'm not quite sure if this is correct so would have to think harder.
Lowering to Dynamic Groups
In order to compile a Calyx program, we first lower all static groups to normal groups by reifing the relative clock. For example, the following static group:
Compiles to:
The last two assignments are continuous since we need to be able to reset the FSM that provides the relative clock. In a fully static program, ideally we'd generate one static FSM per
par
thread.