evincarofautumn / hap-hs

Haskell [re]implementation of Hap, a simple event-based programming language.
MIT License
12 stars 2 forks source link

Concrete syntax #4

Open evincarofautumn opened 4 years ago

evincarofautumn commented 4 years ago

I’d like to use Hap as an opportunity to explore some fun new syntax ideas, while being careful not to blow the weirdness budget entirely, deferring to conventional dynamic imperative languages like JavaScript when there’s no good reason to break familiarity. Some ideas:

evincarofautumn commented 4 years ago

Sketching out some thoughts…

Clearly differentiate initialisation, reassignment/update, reactive binding, and equality

Expression-oriented syntax

This is fairly easy to do with an operator precedence parser, but introduces considerably more flexibility in combining forms that must be accounted for; for instance, if else can be used as an operator separately from if, then it must have a consistent meaning in other contexts.

This is an attractive approach, though, because it makes it easy to add new, user-defined syntactic forms in a consistent and predictable way, and makes parsing simple and regular. It also requires accounting for all combinations of the semantics of things that can be combined syntactically.

I think statement forms should be able to indicate some form of failure, perhaps by returning an error value rather than raising an exception, and then else can check the success or failure of its left operand and evaluate its right operand if the left failed. Conditional operators like if succeed if they evaluate their body. loops like while succeed if they evaluate their body at least once, so the pattern while (A) { B } else { C } executes C if A was initially false and thus B was never evaluated, and similarly for for each (A) { B } else { C }. Asynchronous loops like for all succeed as long as there are elements in their container operand, and fail when that operand becomes empty, enabling e.g.:

for all (block : blocks) {
  when (overlapping(block, any(goal))) {
    remove (block) from (blocks);
  }
} else {
  // The player has beaten the level when all the blocks are in goals.
  win();
}

An undefined value or user-specified failure allows the use of else to select alternative or default values, e.g. input source = (controller) else (keyboard);.

Multi-word identifiers

The idea is to interpret a series of adjacent name parts as a single name. Just like many languages use a pattern like /[A-Za-z_][0-9A-Za-z_]*/ for identifiers, requiring that names begin with an alphabetic character or underscore but allowing them to contain digits, name parts might have different constraints in the head or tail of a multi-word name, such as allowing digit-only name parts after the first, e.g. player 1.

This requires some disambiguation with keywords to prevent them from coalescing. A simple approach is to disallow names from containing or beginning with a keyword, but since keywords tend to be common, short words, this may be too limiting; it would disallow names like all levels or switch on (if all and on are keywords). Keywords could be allowed in identifiers and separated using symbolic syntax, e.g. case of supplies is a name but case (of supplies) is a keyword followed by a bracketed name (if case is a keyword).

A uniform way to deal with this is stropping, marking either keywords or variable names explicitly. Most languages that allow keywords to be used as variable names do so by marking the variables and leaving keywords unmarked, as in C# class (keyword) vs. @class (identifier) or F# let (keyword) vs. ``let`` (identifier). The problem with this is that existing code can still break when new keywords are added. Unfortunately, there’s a strong precedent against stropping keywords, and it leads to visual clutter, e.g.:

\for each (ghost \in ghosts) {
  \let e = ghost.ectoplasm level;
  \if e < séance.power {
    cross over(ghost);
  } \else {
    ++ghost.anger;
  }
}

So another option is to lexically distinguish keywords from names so they can’t collide at all, such as by capitalising one or the other:

For Each (ghost In ghosts) {
  Let e = ghost.ectoplasm level;
  If e < séance.power {
    cross over(ghost);
  } Else {
    ++ghost.anger;
  }
}
for each (Ghost in Ghosts) {
  let E = Ghost.Ectoplasm Level;
  if E < Séance.Power {
    Cross Over(Ghost);
  } else {
    ++Ghost.Anger;
  }
}

This introduces difficulty for beginners, though, who already struggle with case-sensitivity.

It’s also valid to strop all identifiers, which solves the problem of adding new keywords and allows using more keywords rather than symbols, but also introduces considerable noise:

for each [ghost] in [ghosts] {
  let [e] = [ghost].[ectoplasm level];
  if [e] < [séance].[power] {
    [cross over]([ghost]);
  } else {
    ++[ghost].[anger];
  }
}
for each [ghost] in [ghosts] do
  let [e] be [ectoplasm level] of [ghost];
  if [e] is less than [power] of [séance] then
    [cross over] ( [ghost] );
  else
    increment [anger] of [ghost];
  end if;
end for

Iteration operators

For parallel iteration, some expression e containing subexpressions of the form each e, e0[each e1, …, each en], is equivalent to zip withx1. … λxn. e0[x1/each e1, …, xn/each en]) e1en, that is, all of the containers are zipped together with the expression, so each [1, 2, 3] + each [4, 5, 6] = [5, 7, 9].

For nested iteration, e0[every e1, …, every en], is equivalent to flat mapx1. … flat mapxn. e0[x1/every e1, …, xn/every en]) en …) e1, so every [1, 2, 3] * every [5, 7, 11] = [5, 7, 11, 10, 14, 22, 15, 21, 33]

In the simple case of a single filter parameter, e0[which e1] = filterx1. e0[x1/which e1]), so which [5, 10, 15] <= 10 = [5, 10]. When multiple parameters are involved, they are combined as if by every with tupling, and the condition is tested on each tuple: e0[which e1, …, which en] = filter (λ (x1, …, xn). e0[x1/which e1, …, xn/which en]) (zip e1en), so which [5, 10] < which [10, 20] = [(5, 10), (5, 20), (10, 20)], that is, all combinations of values from each container such that the condition is true: W(P, e1, …, en) = { (x1, …, xn) | x1e1, …, xnen, P(x) }.

all, some, none, and how many operate like which, filtering the Cartesian product of their container operands, except that they return Booleans indicating the number of tuples for which the condition held.

These could have several derived forms based on other English determiners/quantifiers, but these seem less generally useful:

where, on indexed containers, performs a selection, returning the set of keys for which a condition is true, rather than the values, so if xs = [1, 2, 3, 4], then where(xs) < 3 = {0, 1} because xs[0] < 3 and xs[1] < 3, and if m = { a: 1, b: 2, c: 3 }, then where(m) mod 2 <> 0 = { "a", "c" } because m.a mod 2 <> 0 and m.c mod 2 <> 0.

To confine the scope of the iteration to a subexpression rather than a whole expression, it may be necessary to introduce some form of scoping, but I think it’s preferable to keep these expressions simple and prefer factoring out separate expressions rather than using complex nesting.

Range operators

Unary relational operators such as <x, =x, and >=x return ranges that allow union, intersection, testing for membership, testing for emptiness, and use in case branches.

Continuous operations and time expressions

Numbers can be equipped with time units, and used in operations that run continuously or at intervals, such as after (1 second) denoting a Boolean that becomes true when 1 second has elapsed after the evaluation of the expression, or every (1 second) for a repeating timer (although this collides with every for iteration).

Likewise, events could be related in time: when (1 second after x = 0) { f(); } is equivalent to something like when (x = 0) { wait(1 second); f(); }.

Anaphora and type-based references

A limited form of anaphora to refer to values by things other than their names could be useful, although it could make code difficult to read if it has complex rules or encourages excessive use. the (type) to refer to the nearest in-scope value matching type seems to strike a good balance amongst utility, readability, and maintainability. it would be suitable for short anonymous functions in a similar vein to the iteration quantifiers above: e[it] = λx. e[x/it], so 5 * it = function (x) { return 5 * x }. This also has the issue of scope, though: how big is the lambda? “As large as possible” and “as small as possible” are both the wrong heuristic in some common circumstances.

evincarofautumn commented 4 years ago

A major source of inspiration for the syntactic–semantic design here is Pane, Ratanamahatana, & Myers: Studying the Language and Structure in Non-Programmers’ Solutions to Programming Problems.

Hap already has the following, or they’re in progress:

Other questions directly from or inspired by the paper that don’t have a clear answer yet:

evincarofautumn commented 4 years ago

Some recent decisions:

Possible next directions: