cucapra / dahlia

Time-sensitive affine types for predictable hardware generation
https://capra.cs.cornell.edu/dahlia
MIT License
130 stars 8 forks source link

Functions vs Macros #140

Open rachitnigam opened 5 years ago

rachitnigam commented 5 years ago

Macros

In the current semantics of Fuse, there are no true "functions". All defs in the semantics should be thought of simple macros that just get expanded at call locations. This notion corresponds to having k copies of RTL if there are k calls to a functions. Furthermore, an unroll context also increases the number of RTL blocks for a functions.

For example, in this code:

void foo(...) { ... }
foo(...);
for (...) unroll k { foo(...) }
foo(...);

there should be exactly k + 2 copies of foo in the hardware design.

Functions

On the other hand, a true function in Fuse would represent exactly one RTL block regardless of the number of syntactic calls. This has a few implication for the semantics.

  1. This code shouldn't work: foo(...); foo(...) since the same RTL block is being sent two signals in parallel.
  2. For the same reason, a call to foo(...) inside an unrolled loop is also incorrect, because the same RTL block is being invoked k times in a single cycle.

Obviously, this notion of functions is very restrictive.

Vivado's default

According to this SDAccel page, syntactic functions have the following defaults:

By default:

  • Functions remain as separate hierarchy blocks in the RTL.
  • All instances of a function, at the same level of hierarchy, make use of a single RTL implementation (block).

The notion of a hierarchy here is fuzzy to me (maybe @sa2257 can clarify). However, this seems to imply that all functions in emitted C++ code are true functions according to our definition.

Things to consider

We have to consider a few things before we add true functions to the language:

  1. The current implementation needs to change to emit the inline pragma for all functions.
  2. We need to understand the Function instantiate pragma and see if we want to use it.
  3. We have to figure out the interaction of the dataflow pragma and the pipelining pragma and see how they interact with true functions at the C++ level.

All of these considerations will inform the design of fuse-to-rtl in the future. I suggest that for the first paper, we only think about defs as macros and not have any support for actual functions till Fuse 2.0.

sampsyo commented 5 years ago

Great writeup! Thanks for crafting a ticket for this frequent discussion point. Here are a few quick notes:

rachitnigam commented 5 years ago

Spatial inlines functions. See under "Using Functions".

rachitnigam commented 5 years ago

Affine functions

Real reusable functions which specify how many instances of RTL blocks implement them. For example (made up syntax):

def foo{N}(...)

creates N instances of the function foo (using the allocation pragma). Next, calls to foo consume one instance of foo:

foo(a,b); foo(a, b); // two copies consumed

And the reasoning extends to unrolled loops:

for (...) unroll 4 {
  foo(a, b); // 4 copies consumed
}

--- regenerates copies of foo. This claim is slightly sketchy because function calls may take multiple cycles.

foo(a, b); // N copies available
---
foo(a, b); // N copies available

The type theoretic ideas might be related to graded modalities (I might be wrong about this, but the idea of affine resources consumable a finite number of things exists).

Bonus implementation points: Multi ported memories will already need a similar style of reasoning to work.

DSE

Allowing source level reasoning for these resources will help with area-efficiency tradeoffs and maybe we can eventually infer the N in the polymorphism extension.

sampsyo commented 5 years ago

Neat! I do think time steps should indeed replenish function resources, as they do memory banks—the reason being that function calls (unless we do something drastic) are synchronous, i.e., a function call waits for the entire function to finish. If we allowed async calls, this would get more complicated—the function resource would probably need to stay consumed past the time step and only get released when synchronizing the result (like forcing a future).