gilch / hissp

It's Python with a Lissp.
https://gitter.im/hissp-lang/community
Apache License 2.0
364 stars 9 forks source link

& doesn't factor well in synexpand #227

Closed gilch closed 7 months ago

gilch commented 1 year ago

One of the major perks of concatenative languages is the easy factoring. One can factor out almost any string of words and give them a name that can be used in its place. There are some subtleties around paired words, however.

Synexpand was designed to be concatenative, first using a literal stack at runtime, and now using term rewriting at compile time to enable the use of macros.

The mathematically "correct" model is probably the composition of normal-order application functions of lists to lists. Nice star to navigate by, but unworkable in Hissp.

Normal order quickly becomes computationally infeasible as it has to repeat calculations that have already been done. There are two workarounds to this: default applicative-order evaluation (with normal-order special forms as needed) and memoization (with laziness and immutability, a la Haskell). The latter is not going to work in Python; mutability is too pervasive. The former is just Lisp, and, well, add-hoc fragments thereof in most mainstream languages.

That's why macros have to work, and using a run time stack or building up composed (applicative order) functions isn't going to be good enough.

There's also a bit of an impedance mismatch when hosting that model.

First, Lisp uses variadic functions pervasively and relies on explicit delimiters to stop taking arguments. Concatenative language functions must have fixed arity. I've found a pretty good solution with the explicit depth markers ^ up to about arity 4 ^^^^ where it gets unwieldy (which is about the reasonable limit of heterogeneous positional arguments anyway IMHO; feature, not bug--make the right way obvious). More can be handled with MARK, PACK, SPLAT, and -data (><*,), although I'm still not sure what to do about kwargs here.

And second, Python doesn't do multiple returns, but concatenative functions must be able to leave any number of additional elements on the stack. In practice (in Python), you'd use a tuple, unless you just need one, in which case you'd just return the thing directly rather than using a single. Usually. Awkward. Currently, Synexpand wraps the result in a prog1 if there's more than one element left, effectively returning the top one, while still allowing for side effects. If you want the whole stack returned, you can simply pack as the last operation.

So, if you want to factor out a word string that doesn't happen to end in a single return, you add a pack, and then usages of the extracted word have to use a splat. Not seamless, but still pretty easy. (Again, subtleties around paired words, but that's true of other concatenative languages.)

And this almost works, but PICK & breaks it. With a run time stack, it was trivial. Just duplicate the pointer. With term rewriting, it's not that easy. The naiive (and mathematically "correct") approach is to simply duplicate the term, but now we run into all the potential efficiency and side effect problems that entails. Currently, it tries to intelligently detect if it's a simple literal, in which case it won't have side effects (but may still be arbitrarily large) and duplicates the term, otherwise it introduces a local first (via let) and duplicates that. Amazingly, this mostly just works, but now if you've extracted it packed, you can't just splat it at its usage site. You no longer have a simple stack (tuple) of terms. It's now divided into a let assignment and body.

gilch commented 1 year ago

I'm tempted to simply copy the term, simple literal or no. If you really don't want it evaluated twice, wrap it in a let yourself (although the point of being point-free is that you don't have to do that!) However, there are more details around exactly when macros should be applied that make this complicated. Fixing & might not be enough.

I could maybe bury the assignment pairs in the bottom in the stack, and then wrap the result in something that puts them in scope instead of the prog1. Some things along these lines seem workable, but still have problems.

This is probably not happening before the 0.4 release.

gilch commented 10 months ago

I just want to note somewhere that I currently consider the entire synexpand suite highly experimental and unstable. The current version seems usable though I question its utility.

gilch commented 9 months ago

I'm just going to copy the subexpression. Yes, this is problematic if there are side effects (or even if a pure computation is too expensive), but even DROP has that problem. It currently deletes the next expression, so if you want the side effect, but just not its result, DROP does the wrong thing.

Read time is too early. This kind of thing needs to be done at a different stage. I think compile time is also too early. It should probably happen at definition time, which means using the higher-order function interpretation. Python isn't normal-order, so we still need the read- or compile-time interpretation for macro control structures.

But there's no reason we couldn't have point-free argument list manipulation decorators in addition to the read- or compile-time term rewriting. E.g. DROP lambda f: lambda x, *xs: f(*xs); DUP lambda f: lambda x, *xs: f(x, x, *xs), etc. These could serve as adapters to whatever stack gets written out.

Ideally, @drop, @pick, and @roll are sufficient, at least for positional arguments (still need to think about kwargs), which is why synexpand has them. Also ideally, these would just be functions, but their implementations are simple enough to be inlined, so macros are feasible.

gilch commented 9 months ago

Hmm, not that easy. The ideal functions would take and return a stack, meaning possible multiple return values (or zero returns), and possible extra arguments that simply pass through. But Python's library functions are of course not written like that. Adapting them requires knowing their intended arity, both in and out. PICK and ROLL also need a depth. That's four arguments: in, out, function, and depth. It's not clear what order they should be in, and (except for the function itself) they're all magic numbers! This is not going to be legible.

out has a sensible default of 1, but maybe we still want a way to change it? SPLAT might be good enough.

depth could be baked into the word (it's how Forth does it), at least up to a certain depth. We're looking at bloating the number of macros quite a bit that way, however. With more regular naming, they could all be generated from a small number of templates. pick0 (dup), pick1 (over), etc.

in arity could be automatically detected, but not in all cases. Common cases would be 0-4. J defaults to 1 or 2 depending on usage. You can do a 0 by passing in a dummy value. Haskell strictly uses 1 and typically uses currying for more. That could maybe be done with partial, although that amounts to counting in unary. Either could get more by passing in a single data structure. So you'd use mark and pack to pre-bundle.

Need to think about this more.

gilch commented 7 months ago

So far, I think this is the last one I have to get done for the v0.5.0 release.

gilch commented 7 months ago

For reference, ae02d33c345075efa5058d5f4faf4ffad942df3b changed it from a stack machine to a term rewriting system. I currently think Lissp needs both, perhaps with some modifications, since neither seems adequate alone. I could use a single reader macro interface and detect if it's being applied to an atom or a tuple, which would use the stack machine or the term rewriting, respectively. (The term rewriting system currently needs the ^#-^^^^# adapters to apply to an atom. Those would go away.)

The stack machine expansions did some rather verbose bookkeeping for even small programs. The more readable expansions is one of the advantages of the term-rewriting system. I briefly looked into using higher-order functions instead of the stack machine, but I don't think these will be any less verbose given the no-dependency condition. The implementations would have to be inlined, or at least locally defined upfront.

It's hard to say which would be more performant without implementing it. HOFs would require each function to be wrapped which would probably take partials or a couple of lambdas each and would use a lot of tuple packing/unpacking/slicing. The stack machine only needs to mutate the single list and can call the functions natively. However, it needs a call to pop each argument, and arguments would outnumber functions, on the other hand, maybe not (typically) by a factor greater than two and these appear to be the cheaper fixed-arity calls. Hissp never really prioritized performance anyway. If you need that you can always inline Python (or write it in C).

gilch commented 7 months ago

The current design seems worse than I thought. I feel like I can improve on it a lot, but not enough to get everything I want. Neither approach is adequate. They're both arguably too complex for bundled macros already. I can streamline them, but not enough. While some machinery could be shared, keeping both is worse in that respect. I don't know how to combine them sensibly either. There's too much of an impedance mismatch among all the paradigms.

The synexpand macros are dragging Hissp development down too much. I need to jettison them. They were experimental anyway. Sunk cost maybe made me too reluctant to drop it, but it can still be a side project in a separate repository, this time without the standalone restriction.

Synexpand maybe tried to do too much at once. It was Hissp's version of Arc's ssyntax, and a spiritual successor to Drython's stack module. I was hoping for something approaching APL terseness, but with more words than symbols. It became a general-purpose language in its own right. I thought that would be more elegant, but that just made it complicated. Ssyntax is arguably mainly for chaining, which Hissp can handle pretty generally already with ->, though not as tersely.

I may be able to make smaller pieces that each do less but can be combined into something like what synexpand was trying to do. This is probably a better fit for the bundled macros, which are supposed to be minimalist. I expect it won't be as terse though.