fmease / lushui

The reference compiler of the Lushui programming language
Apache License 2.0
7 stars 0 forks source link

More ergonomic paths containing punctuation #48

Closed fmease closed 3 years ago

fmease commented 3 years ago

The lexical syntax of paths and punctuation of today leads to rather unpleasant looking paths. Consider the expression core.nat. + 1 2. Notice the space between the path separator . and the punctuation + which is necessary to distinguish it from the punctuation .+. Further, it is likely confusing for newcomers as it looks like applying the function core.nat to three arguments, +, 1, and 2 – if it was not for the trailing dot.

This mandates the following change to our lexical grammar:

Lex dots immediately (that is without a space in between) after alphanumeric identifiers and before other punctuation (which may include dots) as a separate token. For example, alpha.+!.* should be tokenized to Identifier(Alpha), Dot, Punctuation(+!.*). However, alpha.+.beta should result in the tokens Identifier(alpha), Dot, Punctuation(.+), Identifier(beta), not Identifier(alpha), Dot, Punctuation(+), Dot, Identifier(beta)! This means, a cluster of punctuation is never split by dots unless the dot is the first token and the previous token is an alphanumeric identifier.

After this change, there are still situations where one needs to insert additional whitespace between path segments, albeit in way more uncommon scenarios. Namely, namespaces (modules, types, record constructors) aliased to punctuation. Let's say we have a module used as +|+ (why would you, satan?) and we want to access its members alpha and ?\. For the first, it's +|+ .alpha and in the second one, it's +|+ . ?\. Very reasonable (my initial thought was to disallow aliasing those but that would introduce just unnecessary edge cases; punctuation constructors are relatively common, too).

fmease commented 3 years ago

This would allow us to alias core's canonical function composition function to . just like in Haskell which is a bonus for familiarity. Qualified form: external.core... For the record, we planned to name it << and planned to name reverse/flipped composition >> which is what PureScript does. Let's see what we are going to do in the end.

fmease commented 3 years ago

Note that this would still not allow a leading dot for paths – for whatever reason we might like to have that (replacement for crate or external, some new record field syntax, …). As an example, alpha .beta would still be parsed as a single path with the changes described above and not as a function application.

fmease commented 3 years ago

This would allow us to alias core's canonical function composition function to .

Actually, no. It wouldn't with the changes specified in the issue description. In . f g, . is a Dot not a Punctuation(.). We could parse it differently though but it only gets more confusing and complicated. A function taking the composition and something else as arguments, take . other would still be parsed as a path just like take.other.

We could make this work by lexing a . b as Identifier(a), Punctuation(.), Identifier(b) and a.b as Identifier(a), Dot, Identifier(b). Probably as a result, we would lex .a as Dot, Identifier(a) and . a as Punctuation(.), Identifier(a).

With that, we could actually make paths with a leading dot mean external paths without the need to resort to a new keyword like extern or external. Incidentally, there would be no way to refer to "external" itself anymore since just a . would mean Punctuation(.) and thus resolve to a local binding named that way. This might or might now be what we want. I don't see any meaning of "external" on its own.

Edit: We could also use .thing to mean a record field whose base type is inferred or in general "inferrable paths": E.g. .unit might get inferred as Unit.unit.

fmease commented 3 years ago

Note that composition might still look hideous to most people. Consider the Haskell snippet applyC . applyB . applyA in lushui: . apply-c (. apply-b apply-a) or << apply-c (<< apply-b apply-a).