gilch / hissp

It's Python with a Lissp.
https://gitter.im/hissp-lang/community
Apache License 2.0
369 stars 9 forks source link

Rethink `!` and `#""` #187

Closed gilch closed 8 months ago

gilch commented 1 year ago

I'm pretty sure I'm not going to change this one before the next release, but I need to write it down.

I'm not especially satisfied with the Extras system, although it seems to work pretty well so far. I am regretting reserving an extra character for the reader, beyond what is traditional for Lisp. I've since seen at least one Clojure library using symbols with a ! prefix, which are now awkward to write in Lissp. Interop with such a thing would be more difficult. Lisscad also wanted it. When I first enumerated possible characters for the Extra system in #80, I excluded those used as Python operators, but now I think that ^ might have been a reasonable choice. It certainly would have fit with Clojure better, which reserves that as a reader macro for applying metadata. But now we'd have to spell bitwise XOR differently, even if a library wanted to rename everything from operator. But, as Python operators go, this is not a common one, and xor is not a bad spelling for something used so infrequently. And besides, you can still inject it.

Another option I'm considering is #. Tags would put it last, and extras would put it first, so we can tell them apart. I initially dismissed it because it probably doesn't play well with hash strings. However, saving the character might be worth changing the string syntax. To what, I'm not sure though. Maybe Lissp would only support raw strings and have a bundled macro like b# to process escapes. We could even call it !# with the bang freed. No atom special case, and hash strings would only need one more character.

If you're not using the bundled macros, you can still use inject (same as bytes now), although you'd have to layer quotes, e.g. .#"'\n'" for a newline. A weird case is .#"'\"'" for a double quote. Python seems fine with this though. It's still a valid escape even if the string itself is delimited with single quotes.

gilch commented 1 year ago

So,

It occurs to me that a hash string could be written as a dot string instead. Keep the special-cased atom, but just spell it differently.

gilch commented 1 year ago

Reader macros with aliases (a major motivation for Extras in the first place).

b#"bytes" ; With b# in _macro_.

(hissp.._macro_.alias H hissp.._macro_)

H#!b"bytes" ; Current.
H##b"bytes" ; Using # instead of !
H#^b"bytes" ; Using ^ instead of !

Comment strings (a major motivation for keeping the current Extra system).

<<#
#;foo
#;bar
#;baz
."\n"

<<#
#;foo
#;bar
#;baz
!#"\n"

<<#
^;foo
^;bar
^;baz
#"\n"

Decorators

@#!str.swapcase
@#!str.title
(define spam 'spam) 

@##str.swapcase
@##str.title
(define spam 'spam) 

@#^str.swapcase
@#^str.title
(define spam 'spam) 
gilch commented 1 year ago

."" might not play nice with lisp-mode. Older Lisps used (foo . bar) dotted syntax for a cons. A single dot would have to be escaped to act as a symbol. You can't have more than one in a list. You also can't have one outside a list. Probably shouldn't be expected to be compatible with Lisp editors. !# is looking better now.

Another option might be :"". That's currently parsed as a control word followed by a string (Emacs Lisp parses it the same way). You could still get that by adding a space. (:\"\" is a control word, btw.)

gilch commented 1 year ago

Comment strings (a major motivation for keeping the current Extra system).

Not anymore! Parinfer has trouble with unbalanced quotes in comments. It can handle them spread over multiple lines, but not if there's any other token (like !) in between the comments. It is possible to stack !s, to avoid interleaving them but that gets awkward quickly.

Once I ran into this problem I started looking for alternatives. It turned out to be not that hard to tokenize comments in blocks, which means <<# will no longer need Extras at all. It's also not too hard to get the contents out of such blocks in a consistent way: strip the indents, followed by any number of ;s and up to one following space. The tokens must end it a newline for the REPL to continue properly, so the final newline must be stripped as well. That's enough for <<# to handle arbitrary raw strings.

The old version using extras had some stylistic benefits. The non-comment primary was a way to avoid the dangling bracket. (A discarded item will still do, as always, but it's less natural or enforced.) It also meant that there was no choice to be made about the number of leading ;s or whether or not to use a following space.

The new version also has advantages. Converting from a string literal is as simple as commenting out the lines with the editor and removing the quotes. Converting back is just the reverse of the process.

Without that motivation for keeping the current Extra system, I'm more willing to explore more diverse alternatives.

gilch commented 1 year ago

The arity/depth suffix ^^^ used in synexpand might be a viable alternative approach. I initially ruled it out because of the comment motivation. I think approximately arity four is a reasonable upper limit for heterogeneous positional arguments. Any more than that and you should be using kwargs instead, or you're imposing too much meaningless baggage in the ordering. (For homogeneous args, you can pass in/splat in an iterable instead.) That's why XYZW# doesn't have quinary analogue, just en#X#. The exceptions are rare enough that I'd rather make you write your own than include one and impose it on everybody else.

It's a much better design to have string literals require a closing quote than an opening quote and an upfront count (which used to be a thing in assembly languages). That's why this felt like a bad design. But that's only because counting that high accurately is hard. Humans have no difficulty counting to three. Four is pushing it, but it's less than the typical expected usage. Making them count in unary with a tally will probably make them reluctant to go too high.

So foo#!!! a b c primary works now and doesn't look too bad. foo#### a b c primary is pushing it with 4, but ought to be uncommon. foo## a primary and even foo### a b primary look fine.

But how to handle kwargs? It seems like options would be a common use for extras. It would be nice to have an easy way to pass them in. Currently, you're stuck with foo#!!! : bar spam primary. If it required all !s to be up front, then putting a symbol on the other side could be an indicator of something. Perhaps foo#bar!spam primary would resolve like foo(primary, bar=spam). foo#*!**! spam eggs primary would resolve like foo(primary, *spam, **eggs). This feels pretty good. At some point (4?) you'd stop stacking bangs and wrap the args in a tuple and splat with a *!.

I'm not sure how easily I could make the reader do this, but it's certainly possible.

One major issue is that symbols ending in a bang are a common warning about functions with side effects in other lisps. I'd rather not preclude that convention. Using # instead just makes them look like stacked reader macros, like the en#X# idiom. I don't want to preclude that either. ^ might be a good choice: foo#*^**^ foo#bar^spam, but it might conflict with synexpand now. Putting the # first for extras will cause the same editor issues that convinced me to switch to putting it last in the first place.

gilch commented 10 months ago

I'm considering something like REBOL's refinements to handle keyword arguments in synexpand. They look like foo/spam/eggs or foo/eggs/spam, depending on which order you want to pass in the additional named arguments to foo, which you'd pass in positionally like usual. REBOL's docs described them as "adjectives", which in English, at least, come before the word they're describing. So I'm also considering syntax like eggs=spam=foo or spam=eggs=foo.

We could use special refinement words to unpack, like *=**=foo or foo/*/** and, like in Python calls, these could be repeated.

It would be nice if the multiary reader macros could work similarly, like foo/refinement# or refinement=foo#. But what order should that imply for inputs? The same as Python or the same as written, which is closer to how extras work now? (So one keyword then one positional.) Or should it be refinement=foo## for that? We can tell that an extra argument is required per refinement, but we still need at least one # at the end to recognize it as a reader tag, even if the sole argument is named by a keyword. What do you do then? Or do we? Reader tags could be recognized by the presence of an internal = as well. Not sure how I feel about this. And how does all of this interact with the EDN Hissps?

I think I need to play with synexpand variations more first. Once I'm happy with it, I can apply the lessons learned to overhaul/replace the extras system.

gilch commented 8 months ago

I'm making some progress on the tag system. sep=print#### : 1 2 3 works, as does print## =### 1 2 3 sep=# :, not that you should have side effects at read time. Arity is strictly the number of (unescaped) trailing #s. Empty tags require at least one prefix (like the =###), and create a Pack object, which is like Extra, but stores kwargs separately. This frees up !, which is no longer a reader macro.

I am really trying not to overcomplicate this, without giving up any power. I'm not sure if it's working. There's no longer a special "primary" argument. That was kind of confusing anyway, useful as it was. I'm missing it, but it's not a good conceptual fit now that we can use higher arities. I could certainly create more concise syntax by special-casing more characters, but that gets complicated. That goes the way of Perl. I'm not going there. The new system only special cases # and =, and only in tags. Pack is kind of magic, but it uses = and # in exactly the same way.

I did have to add some syntax error cases, like when there aren't enough #s for the =s (DRY violation, or redundancy check?) or when you try to use a Pack as kwarg (Inelegance?). It took more code than I expected in the reader to pull this off. Bad sign, but I haven't nailed it down enough to refactor it yet. It might get better. It also doesn't seem any easier to explain in the docs. Also a bad sign, but also might get better.

gilch commented 8 months ago

The =/# redundancy is a problem. The issue is that tags should have at least one argument, since a zero-argument tag could only operate by side effects, and tags should mostly not have side effects. But the one argument might be either a keyword or a positional argument, so a tag could have zero positional or zero keyword arguments, just not both at once. I also want the single position argument tags to be super terse (like foo#), since they're the common case.

I thought of a better solution. Use only the one # (which will simplify parsing a little), count each (unescaped) =, and add one to that, but only if there's a following name. Pack macros must always have an explicit argument, so they're minimum 1 as well.

That means a single kwarg macro must use a pack object as its sole positional argument, which is made with some other macro. So it would typically look like foo#bar=#baz where bar is the keyword and baz is the value for it. Similarly for the special unpacking names, like foo#*=#xs. (You can even force zero arguments like foo#*=#().) This is not super concise, but doesn't need to be and is not bad. Kwargs are going to add a lot to the length just for the keyword, so the overhead is usually not significant. Similarly for *=, which is mostly for compatibility. If you really want the macro to take a single collection without the overhead, just define it like that in the first place.

One thing I don't like as much is how extra positional arguments look. They'd only be expressible with more = prefixes, so you'd get aliases like =H:#b"bytes" instead of H:##b"bytes" or H:#!b"bytes". Maybe I'll get used to it. I'll try it for a while. An alternative might be to use suffixes (like Rebol) instead of prefixes. Then it would look like H:/#b"bytes". That doesn't look bad, but I might want some other character for better Garden of EDN compatibility. And then it looks like the initial name should get the first argument, which is how Rebol does it, but maybe not how I'd like it to work. Hmm.

gilch commented 8 months ago

H:#==#b"bytes" would also work but doesn't seem worth the overhead compared to =H:#b"bytes". b=H:#"bytes" doesn't seem better and probably isn't even workable. H:#b=#bytes looks better but seems unworkable for the same reason. What if the b# macro might take kwargs itself? I guess dicts are ordered now, but we're still risking a collision. It really needs to be positional.

gilch commented 8 months ago

OK, I tried it out and it was not so good. Maybe coming up with more test cases first would have saved a few days' work.

I want,

  1. simple unary cases to remain as simple as they are now. E.g., X#.
  2. aliases to work on tags, and to look pretty much like unaliased tags, but shorter (like Clojure) even if they need additional arguments.
  3. small options to be allowed before large targets. E.g., spy# with file=sys.stdout.
  4. to not overcomplicate the reader.
  5. to not overcomplicate the language and required docs.

5 is especially vague.

2 is why I gave up on #244. Switching to postfix wouldn't have helped enough.

3 is kind of an impedance mismatch with Python, which lead to the weird "primary" ordering in the current Extras system. Rebol-style postfixes would also have this problem. Without this, we might as well restrict tags to a single argument like EDN and use inject everywhere. I hope to do better than this.

It would also be nice if I could give back the ! macro character.

gilch commented 8 months ago

The mathematically "simplest" possible version is unary with currying, like Haskell (lambda calculus). This requires tag macros (functions) to be recursive (first-class), i.e., they can result in another tag. I think this still requires parentheses for ordering, however.

Polish notation can eliminate the parentheses, and Lisp is prefix to begin with. (It only needs parentheses due to pervasive variadic functions.) If the arity of each tag is fixed and known, they're not required.

The minimum arity without recursive macros (tags) is two, which amounts to consing. (Unary with a void/drop value can also work but has to sneak information in via side effect. This was the inspiration for the current Extras system, which is basically that, but without the side effects.)

And finally, kwargs make everything more complicated. There are different ways to represent them positionally. The : used in Hissp makes sense there, but it's too much overhead for extras.

Consing can make simple lists, but also alists, or even a simple list prepended to an alist. A call like foo(1, 2, 3, a=4, b=5) could be written with Polish conses like

foo# !1 !2 !3 !!a 4 !!b 5 NIL

Or perhaps

foo# !1 !2 !3 !!a 4 !!b 5 :

I expect this would simplify the reader because I could cut out a lot of special cases around Extras. That nil really feels like a closing bracket, but that's just a dummy value workaround for having a single arity. You could use a unary nilcons (:#, say) for the final element.

foo# !1 !2 !3 !!a 4 :#!b 5

Still kind of a closing bracket, but at least it's on the left side, meaning the final argument can be something multiline without requiring a closer afterwards.

This feels pretty similar to the current system but has a simpler implementation. It's probably slightly easier to use since you don't have to pass in a : to separate args from kwargs. The pairs could really go anywhere in the list, although the positional arguments still have to be ordered correctly relative to each other. spy# !!file sys..stdout :#(foo) ought to be possible.

gilch commented 8 months ago

Giving back !, but reserving a final = could give us

foo##### 1 2 3 a=4 b=5

The number of #s indicate foo#'s arity. The a= and b= are kwarg tags that return a Kwarg object, which is a special cased type in the reader kind of like Extra is now. *= and **= could unpack.

An alternative reserving a final =# instead would look pretty similar.

foo##### 1 2 3 a=#4 b=#5

This would allow a final = in symbols, at the cost of longer kwarg tags.

This feels pretty straightforward. alias would work well. spy# would work OK but need a kwarg. It's not quite as powerful as Extras where custom macros can return an arbitrary number of arguments. But *= and **= (and custom reader macros) are probably good enough, and probably worth it for the simplicity.

Five #s in a row (as in the examples) are too many to be legible, but this is a feature. It helps keep argument counts under control. Make the right way obvious. It's easy to group args.

foo### *=#(1 2 3) a=#4 b=#5

Kwargs are less easy, but still possible.

foo## *=#(1 2 3) **=# .#(dict : a 4  b 5)
foo## *=#(1 2 3) **=# .#(% 'a 4  'b 5)