Open evincarofautumn opened 4 years ago
Sketching out some thoughts…
Clearly differentiate initialisation, reassignment/update, reactive binding, and equality
=
has a strong precedent for definition, initialisation, and assignment as in var name = initialiser;
or variable = expression;
I would prefer to use a single =
to denote equality comparison
==
has a strong precedent for equality
Confusion between =
for assignment and ==
for equality can lead to bugs if they’re permitted in the same context
=
/==
confusion can be mitigated by having =
for assignment act as a statement, or as an expression, return a unit value that cannot be mixed up with a conditional/Boolean
Other languages generally lack a notion of reactive binding, so we’re free to use any evocative symbol; I like :=
for this
A leftward arrow <-
is evocative of updating (put the rhs into the lhs), but this is ambiguous with the desire to include unary range operators, such that <-5
could be interpreted as either (<- (5))
or (< (- (5)))
Restricting to ASCII, leftward arrows and range operators could be differentiated by whitespace or brackets, and it’s easy to provide good error messages: “The expression <-5
looks like an assignment expression missing its left operand. If you wanted a range instead, use < -5
or <(-5)
.”
A rightward arrow ->
or =>
is also suitable for assignment (the lhs “goes to” or “becomes” the value of the rhs)
We could use keywords instead, either as operators (e.g. is
for initialisation, gets
, becomes
, or is now
for reassignment; means
or is always
for reactive binding; and is
, equals
, or eq
for equality) or statement-like forms (let name be expression;
, set lvalue to expression;
, def name as expression;
)
There is a conflict between descriptive, verbose keywords for these primitive operations and allowing multi-word identifiers
Expression-oriented syntax
This is fairly easy to do with an operator precedence parser, but introduces considerably more flexibility in combining forms that must be accounted for; for instance, if else
can be used as an operator separately from if
, then it must have a consistent meaning in other contexts.
This is an attractive approach, though, because it makes it easy to add new, user-defined syntactic forms in a consistent and predictable way, and makes parsing simple and regular. It also requires accounting for all combinations of the semantics of things that can be combined syntactically.
I think statement forms should be able to indicate some form of failure, perhaps by returning an error value rather than raising an exception, and then else
can check the success or failure of its left operand and evaluate its right operand if the left failed. Conditional operators like if
succeed if they evaluate their body. loops like while
succeed if they evaluate their body at least once, so the pattern while (A) { B } else { C }
executes C
if A
was initially false and thus B
was never evaluated, and similarly for for each (A) { B } else { C }
. Asynchronous loops like for all
succeed as long as there are elements in their container operand, and fail when that operand becomes empty, enabling e.g.:
for all (block : blocks) {
when (overlapping(block, any(goal))) {
remove (block) from (blocks);
}
} else {
// The player has beaten the level when all the blocks are in goals.
win();
}
An undefined value or user-specified failure allows the use of else
to select alternative or default values, e.g. input source = (controller) else (keyboard);
.
Multi-word identifiers
The idea is to interpret a series of adjacent name parts as a single name. Just like many languages use a pattern like /[A-Za-z_][0-9A-Za-z_]*/
for identifiers, requiring that names begin with an alphabetic character or underscore but allowing them to contain digits, name parts might have different constraints in the head or tail of a multi-word name, such as allowing digit-only name parts after the first, e.g. player 1
.
This requires some disambiguation with keywords to prevent them from coalescing. A simple approach is to disallow names from containing or beginning with a keyword, but since keywords tend to be common, short words, this may be too limiting; it would disallow names like all levels
or switch on
(if all
and on
are keywords). Keywords could be allowed in identifiers and separated using symbolic syntax, e.g. case of supplies
is a name but case (of supplies)
is a keyword followed by a bracketed name (if case
is a keyword).
A uniform way to deal with this is stropping, marking either keywords or variable names explicitly. Most languages that allow keywords to be used as variable names do so by marking the variables and leaving keywords unmarked, as in C# class
(keyword) vs. @class
(identifier) or F# let
(keyword) vs. ``let``
(identifier). The problem with this is that existing code can still break when new keywords are added. Unfortunately, there’s a strong precedent against stropping keywords, and it leads to visual clutter, e.g.:
\for each (ghost \in ghosts) {
\let e = ghost.ectoplasm level;
\if e < séance.power {
cross over(ghost);
} \else {
++ghost.anger;
}
}
So another option is to lexically distinguish keywords from names so they can’t collide at all, such as by capitalising one or the other:
For Each (ghost In ghosts) {
Let e = ghost.ectoplasm level;
If e < séance.power {
cross over(ghost);
} Else {
++ghost.anger;
}
}
for each (Ghost in Ghosts) {
let E = Ghost.Ectoplasm Level;
if E < Séance.Power {
Cross Over(Ghost);
} else {
++Ghost.Anger;
}
}
This introduces difficulty for beginners, though, who already struggle with case-sensitivity.
It’s also valid to strop all identifiers, which solves the problem of adding new keywords and allows using more keywords rather than symbols, but also introduces considerable noise:
for each [ghost] in [ghosts] {
let [e] = [ghost].[ectoplasm level];
if [e] < [séance].[power] {
[cross over]([ghost]);
} else {
++[ghost].[anger];
}
}
for each [ghost] in [ghosts] do
let [e] be [ectoplasm level] of [ghost];
if [e] is less than [power] of [séance] then
[cross over] ( [ghost] );
else
increment [anger] of [ghost];
end if;
end for
Iteration operators
For parallel iteration, some expression e containing subexpressions of the form each
e, e0[each
e1, …, each
en], is equivalent to zip with
(λx1. … λxn. e0[x1/each
e1, …, xn/each
en]) e1 … en, that is, all of the containers are zipped together with the expression, so each [1, 2, 3] + each [4, 5, 6]
= [5, 7, 9]
.
For nested iteration, e0[every
e1, …, every
en], is equivalent to flat map
(λx1. … flat map
(λxn. e0[x1/every
e1, …, xn/every
en]) en …) e1, so every [1, 2, 3] * every [5, 7, 11]
= [5, 7, 11, 10, 14, 22, 15, 21, 33]
In the simple case of a single filter parameter, e0[which
e1] = filter
(λx1. e0[x1/which
e1]), so which [5, 10, 15] <= 10
= [5, 10]
. When multiple parameters are involved, they are combined as if by every
with tupling, and the condition is tested on each tuple: e0[which
e1, …, which
en] = filter
(λ (x1, …, xn). e0[x1/which
e1, …, xn/which
en]) (zip
e1 … en), so which [5, 10] < which [10, 20]
= [(5, 10), (5, 20), (10, 20)]
, that is, all combinations of values from each container such that the condition is true: W(P, e1, …, en) = { (x1, …, xn) | x1 ∈ e1, …, xn ∈ en, P(x) }.
all
, some
, none
, and how many
operate like which
, filtering the Cartesian product of their container operands, except that they return Booleans indicating the number of tuples for which the condition held.
all
returns whether which
with the opposite condition would return empty, or equivalently, whether how many
with the opposite condition would return zero: ∀x. P(x), ¬∃x. ¬P(x), or |W(¬P, ê)| = 0
some
returns whether which
would not return empty, or whether how many
would return nonzero: ∃x. P(x), ¬∀x. ¬P(x), or |W(P, ê)| > 0
none
returns whether which
would return empty, or whether how many
would return zero: ¬∃x. P(x), ∀x. ¬P(x), or |W(P, ê)| = 0
how many
returns the size of the result of which
: |W(P, ê)|
These could have several derived forms based on other English determiners/quantifiers, but these seem less generally useful:
one
returns whether how many
would return exactly 1
multiple
returns whether how many
would return more than 1
proportion
returns the result of how many
divided by the product of the sizes of the inputs
most
returns whether proportion
exceeds 1/2
where
, on indexed containers, performs a selection, returning the set of keys for which a condition is true, rather than the values, so if xs = [1, 2, 3, 4]
, then where(xs) < 3
= {0, 1}
because xs[0] < 3
and xs[1] < 3
, and if m = { a: 1, b: 2, c: 3 }
, then where(m) mod 2 <> 0
= { "a", "c" }
because m.a mod 2 <> 0
and m.c mod 2 <> 0
.
To confine the scope of the iteration to a subexpression rather than a whole expression, it may be necessary to introduce some form of scoping, but I think it’s preferable to keep these expressions simple and prefer factoring out separate expressions rather than using complex nesting.
Range operators
Unary relational operators such as <x
, =x
, and >=x
return ranges that allow union, intersection, testing for membership, testing for emptiness, and use in case
branches.
Continuous operations and time expressions
Numbers can be equipped with time units, and used in operations that run continuously or at intervals, such as after (1 second)
denoting a Boolean that becomes true when 1 second has elapsed after the evaluation of the expression, or every (1 second)
for a repeating timer (although this collides with every
for iteration).
Likewise, events could be related in time: when (1 second after x = 0) { f(); }
is equivalent to something like when (x = 0) { wait(1 second); f(); }
.
Anaphora and type-based references
A limited form of anaphora to refer to values by things other than their names could be useful, although it could make code difficult to read if it has complex rules or encourages excessive use. the (type)
to refer to the nearest in-scope value matching type
seems to strike a good balance amongst utility, readability, and maintainability. it
would be suitable for short anonymous functions in a similar vein to the iteration quantifiers above: e[it
] = λx. e[x/it
], so 5 * it
= function (x) { return 5 * x }
. This also has the issue of scope, though: how big is the lambda? “As large as possible” and “as small as possible” are both the wrong heuristic in some common circumstances.
A major source of inspiration for the syntactic–semantic design here is Pane, Ratanamahatana, & Myers: Studying the Language and Structure in Non-Programmers’ Solutions to Programming Problems.
Hap already has the following, or they’re in progress:
The overall program structure is biased toward events, with imperative actions secondary
Iteration quantifiers (described above) provide container-level and function-level iteration over sets and subsets, rather than object-level loops; the vast majority of looping is implicit
State is maintained using mainly behaviours attached to entities, with a minority described using explicit updates
and
is primarily Boolean conjunction, secondarily sequencing
or
is primarily Boolean disjunction, secondarily “else”, “otherwise”, clarification, or restatement
then
is primarily sequencing
Conditionals are specified primarily using mutually exclusive rules, or a general case with exceptions (using e.g. but
), and secondarily with Boolean logic
Time and motion are continuous; relationships between past and present are implicit in events, or specified using time relations like after
Other questions directly from or inspired by the paper that don’t have a clear answer yet:
How should Hap differentiate between sequences of actions that can be interrupted vs. those that must execute as a unit? The tentative idea is to differentiate e.g. the discrete/imperative while
(evaluate the body as a unit) from the continuous/event-oriented as long as
(as soon as the condition becomes false, at any point in the body, it stops evaluating) but this raises hairy questions of “transactions” (such as needing to use atomically
within as long as
to group statements)
How should the user specify constraints or invariants that should always hold (“the player cannot move outside the screen”) or declarative specifications of situations (e.g. “there are 4 blocks”)
There should be some way to talk about all instances of an object or entity (#2) and refer to nearby/obvious objects anaphorically
What is the perspective of program structures? First-person as the user or as the programmer, second-person as the programmer addressing the user, or third-person narrator?
Iteration constructs and quantifiers should allow talking about negation or inverses of sets, even if they aren’t actually enumerable
What is the precedence of not
? Boolean logic uses a convention of high precedence, but the default English interpretation has low precedence
Some recent decisions:
[x] Require delimiters around statement bodies, to avoid both “dangling else” and ambiguity with map/set literals
[x] Split keywords into primary/secondary/contextual; an identifier may contain any of them, but may not begin with a primary keyword, and will only parse as a contextual keyword when it appears alone
[x] Standardise on spaces as word/digit separators
[ ] Allow other reasonable word characters like apostrophes and dashes
[x] Remove _
as a word character
[x] Use _
as the subscript operator, freeing up [
]
brackets
[x] Don’t bother with exponential/scientific notation
[ ] Use #
as a number prefix for alternate bases
[ ] Make default base 16 for things like colour codes: slate gray = #708090
[ ] Subscript for explicit radix, like mathematical notation: #CAFE BABE_16
, #1010_2
, #Aa0+/=_64
[x] Use nested quotes for text splices instead of backslash escapes (need to supply character name constants in standard library instead)
[ ] Support curved quotes
[ ] Support multi-line text (leaning toward blockquote style, with prefix on each line)
[ ] Retain comments instead of discarding them
Possible next directions:
[ ] Limited whitespace sensitivity
[ ] Add generic block statement with keyword (e.g. do { … }
, but maybe not do
)
[ ] Add multi-line comment notation (not a lot of evidence that this is intuitive/usable/desirable)
[ ] Merge statements and expressions (“no sublanguages”)
I’d like to use Hap as an opportunity to explore some fun new syntax ideas, while being careful not to blow the weirdness budget entirely, deferring to conventional dynamic imperative languages like JavaScript when there’s no good reason to break familiarity. Some ideas:
Clearly differentiate initialisation, reassignment/update, reactive binding, and equality
Expression-oriented syntax (e.g.
if
can be used as a statement or expression,on
andasync
return handler IDs, &c.)Operator precedence grammar (e.g.
if X Y
andX else Y
as operators)Multi-word identifiers (e.g.
left arrow key
) disambiguated by syntax/keywords/stroppingIteration operators to subsume common looping patterns
Range operators
Event- and game-specific features (e.g. continuous operations and time expressions like
every (1 second) { … }
)Anaphora and type-based references (
it
,the Image
)