emdash / udlang

A practical, functional language for stream processing.
GNU Lesser General Public License v3.0
1 stars 0 forks source link

Partial Evaluation #37

Open emdash opened 3 years ago

emdash commented 3 years ago

Partial Evaluation

Partial functions are implemented by incrementally constructing a callable on the stack, called a thunk. The distinction between a thunk and a plain function is implementation-specific, but in particular, a block is a statically known sequence of instructions, while a thunks can in principle be constructed at runtime or at compile time.

Thunks are first-class values. They can be applied, like ordinary functions, since they are callable. But they can also compose at the instruction level. Thunks are built up incrementally on the stack by applying operations to thunk operands. Only calling a thunk leaves an ordinary (non-thunk) value on the stack.

Syntax

$ - each instance creates a distinct parameter $..., rest place holder, can appear at most once, in the final argument position.

Extended syntax

These are intuitive, but might be tricky to implement as described below. $0, $1... etc, positional place holders, each can appear more than once. arity is implied by max() over all indices. $0..3, slicing placeholder, can appear more than once. $3.., slicing rest place holder.

Operations on Thunks

Any instruction can consume one or more thunk operands, and the result will yield a thunk.

Informal Examples

The identity thunk is written as thunk([arg(0)], 1, 1, []). This is a thunk which simply returns is argument unchanged when called. The placeholder instruction simply places this thunk on the stack.

Here are some examples to get a feel for how thunks are constructed incrementally. Here $ is used as shorthand for the placeholder instruction.

Special case for partially applied function calls:

Formal Description

Thunk Monoid

Thunks form a monoid with $ and concatenation.

Instruction Lifting

In general, other than arg(x, y), an arbitrary instruction X can be lifted to a thunk: lift(X) => thunk([X], 0, r(x), []), where and r(X) is the return arity (usually 1, but 0 in the case of out, and could be larger in the presence of de-structuring, for example), and [] is an empty capture list.

For arg instructions, this becomes: lift(arg(x, y)) => thunk([arg(1, 0)], 0, 1, [arg(x, y)]), i.e. the argument value is captured, and the resulting thunk takes no arguments and returns the captured value.

Note that the argument arity of the lifted instruction is always 0, which can result in degenerate thunks which cannot be called without causing a stack underflow. This is acceptable because the lifted thunks are never placed on the stack directly, but concatenated as described below. Defining lifting this way is crucial for this definition of concatenation.

The well-formedness of the resulting thunk depends on the well-formedness of the instruction sequence producing the thunk.

Instructions consuming thunk operands

Using the definitions above, we can extend arbitrary instructions to support thunk operands.

Here t<n> represents an arbitrary thunk operand, and v denotes an arbitrary non-thunk value:

Ternary and higher cases omitted here for brevity, but they amount to a straight-forward extension of this pattern. An operand is either a thunk, or a plain value. Plain values and instructions are lifted to thunks, all thunks concatenated, and the result placed on the stack.

General thunk concatenation

The "++" operation on thunks denotes concatenation.

For any two thunks left = thunk(lblock, largc, lretc, lcaptures), and right = thunk(rblock, rargc, retc, rcaptures), their concatenation left ++ right is as defined as:

left ++ right = thunk(lblock ++ rblock.map(adjust_args), largc + rargc, rretc, lcaptures ++ rcaptures), where adjust_args is defined below.

That is to say: instruction blocks are concatenated, adjusting argument indices as needed. The argument arity of the resulting thunk is the sum of the left and right argument arities, while the return arity is simply that of the right thunk (it is assumed that the right thunk's code block consumes any values which the left thunk leaves on the stack). Capture lists are also concatenated.

Argument Adjustment

Argument frame 0 denotes the thunk's formal parameters. Argument frame 1 is used for captured thunk values.

Most instructions on the right hand side concatenate into a thunk unchanged; however, arg instructions must be adjusted to bind the correct argument inside the thunk.

Special Case: Thunks as Function Arguments

TBD, expand on how this can be optimized by merging thunks eargerly. May be more appropriate for compile-time vs runtime.

Interaction with const-folding

Thunk concatenation can occur at compile-time during const-folding.

Const-folding may happen in the front-end at the AST level, but it should result in the same instruction sequences in generated code. Moreover, const-folding is allowed to apply thunks when all arguments are known, so the result may not appear in the generated code at all in this case.