[enh] Named arguments for macros

azazel75 / macropy

Macros in Python: quasiquotes, case classes, LINQ and more!

29 stars 4 forks source link

[enh] Named arguments for macros #13

Open Technologicat opened 6 years ago

Technologicat commented 6 years ago

On some occasions, being able to pass named arguments to macros would be useful.

Use case, related to the multilambda macro in unpythonic (rackety lambda with implicit begin, for Python):

from multilambda import macros, λ

myadd = λ(x, y)[print(x, y), x + y]
assert myadd(2, 3) == 5

echo = λ(x='hi there')[print(x), x]  # doesn't work, needs named arg support

(Usage, implementation.)

For the same use case, *args and **kwargs support would be really nice. :)

Thoughts?

[edit] update link. [edit2] these links are now obsolete; the silly λ macro has been removed.

azazel75 commented 6 years ago

I don't see a compelling argument here.. just "would be really nice"?

Technologicat commented 6 years ago

The argument is that currently, it's not possible to do certain things - such as, in the use case above, define a λ macro that allows default values for its arguments, or allows *args and/or **kwargs.

The point of this macro is to reduce the amount of typing - not for lambda itself, but shortening lambda arg0, ...: begin(body0, ...) to λ(arg0, ...)[body0, ...] - so that there is no need to type out the begin(), making it more convenient to write multi-expression lambdas.

The example is perhaps a bit silly in that I have no idea if I'll ever use this particular macro in production code - it's so unpythonic that it borders on the limit of good taste. At least I won't use it if there is no way to give defaults to arguments, and it doesn't support *args or **kwargs. :)

I can try to find a better example where the feature is needed; this is just where I first noticed the missing feature, so I thought I'd open a ticket for discussion before I forget.

Technologicat commented 6 years ago

I don't know if it makes the use case any more compelling, but https://github.com/Technologicat/unpythonic/commit/b51337b2f5615afd44c82390fb40c17af6bd17e5 makes λ a first-class citizen that can have not only multiple body expressions, but also its own local variables.

To become really useful, λ would need to handle default values for arguments, and *args and **kwargs - i.e. have all the args-handling features of the regular lambda - but currently this can't be done.

I think I could take a look at what it would take to support named args in MacroPy (to support default values for args in λ), but *args and **kwargs are probably better left out until there is no need to support Python 3.4.

Technologicat commented 6 years ago

Here, I made a first cut of this: https://github.com/Technologicat/macropy/commit/653b2d2292215b1ff9aff0a0c75bc7085ad8b6b5

At least all tests still pass, so I probably didn't break much :3

Here's also an updated λ that uses the new mechanism, for declaring default values: https://github.com/Technologicat/unpythonic/commit/4e2a28c9c67244cbdd9aa3ab096ce7dc1a5abb09

How it works:

There is a new magic in **kw, called kwargs. It gets the named arguments given to the macro invocation (if a Call), as an OrderedDict. (A better name is welcome, but OTOH kwargs is the usual pythonic partner of args.) The key is an str, the value is an AST node.
The old kwargs field in MacroData was only used to save the assignment target of a with block; this is now saved in the new field extrakws. This split was done to isolate the user-given named args from the **kw magic dictionary, which MacroPy uses internally.
OrderedDict was chosen because the keywords field of a Call is a list; there may be important information encoded in the ordering, so we should preserve it.

As for *args and **kwargs, my suggestion is to ignore that part of my original post; the problem disappears once we upgrade requirements to Python 3.5. Then the extra given arguments are absorbed into new specially formatted items in args and keywords fields of a Call. With the present addition, the mechanism we have now can handle both.

Anyway, named args for macros are now here - thoughts?

[edit] clarify why OrderedDict; mention data types of key and value. [edit2] fix silly mistakes.

Technologicat commented 6 years ago

Re-checking PEP 448, the proposed first-cut solution does need a small revisit after upgrading to Python 3.5, because multiple * and ** items may then appear in the same call.

Multiple * pose no problem. The code that already exists in MacroPy can handle the Starred nodes just like any other arg, and let the macro do what it wants with them, since the macro asked for arguments. ;)

Supporting multiple ** requires perhaps dropping the OrderedDict currently proposed here, and just passing through a list of keyword items. Then let the macro do what it wants with them.

Technologicat commented 6 years ago

Ah, well, second cut:

https://github.com/Technologicat/macropy/commit/ddd9d7545eeac259dcaf06c08be286b3667addfd https://github.com/Technologicat/unpythonic/commit/80af4b8fe692fd78aeaf70f3759fe7be0d2b7581

Ditched the OrderedDict in favor of just passing through the list of keyword objects. The advantage, beside better 3.5 support, is that (in PG's famous? words) the abstraction is so thin it's practically transparent - the user can now use the Green Tree Snakes docs to understand what the magic kwargs contains.

Now, thoughts? :)

azazel75 commented 6 years ago

Thanks Juha,

but your solution and in general all the situation leaves me more perplexed... I've opened the door to calling the macro with any positional argument or keyword with transform() (even it's still to be refined to give the parameters injected by the machinery protection against being shadowed by those specified by the user) and you propose to augment this somewhat arcane way of passing arguments (that is now the args parameter to the macro for me) that the macro implementer has to parse on his own, maybe with the complication of implementing support for multiple runtime versions...It doesn't seem the right thing to do.

I would rather prefer passing those parameters and keywords as real python objects, not some AST trees, but when expansion happens there nothing running yet so this would be working only for parameters and keywords bound to literals or pure expressions.... I need to think over it and to see some real example... isolated.

Anyway please open a PR with your code. I mean, move your commits to another branch (one per PR) and open a PR from it or your code will not be commentable.

Please post here some example of your lambda macro, with comments so that I can understand what it's meant to do without reading a ton of code

Technologicat commented 6 years ago

Thanks for the heads-up, I'll make my code commentable and follow up with a PR for discussion.

IMHO, args as an AST is a feature, not a bug; as you said, it's before run-time, so nothing exists yet. Leaving it to each macro to decide what to do with the input ASTs sounds to me it's exactly within the job description of a macro.

Why some form of args - as a minimal example, consider:

@macros.expr
def let(tree, args, **kw):  # args: sequence of ast.Tuple: (k1, v1), (k2, v2), ..., (kn, vn)
    names  = [k.id for k, _ in (a.elts for a in args)]
    values = [v for _, v in (a.elts for a in args)]
    lam = q[lambda: ast_literal[tree]]
    lam.args.args = [arg(arg=x) for x in names]
    return q[ast_literal[lam](ast_literal[values])]

@macros.expr
def letseq(tree, args, **kw):
    if not args:
        return tree
    first, *rest = args
    return let.transform(letseq.transform(tree, *rest), first)

Note .transform killed all boilerplate to write those macro definitions, which is excellent. Usage:

let((x, 1),
    (y, 2))[
      print(x + y)]

letseq((x, 1),
       (x, x+1))[
         print(x)]

The new identifiers are declared as bare names - being able to do this relies on the fact that the input is an AST. Sure, we could place the bindings at the beginning of the tree:

let[((x, 1),
     (y, 2)),
      print(x + y)]

but a separate bindings section looks more readable.

Why some form of named args - it would let us do this:

let(x=1,
    y=2)[
      print(x + y)]

which looks more pythonic. It also allows neat new stuff like args with default values in λ, but I now think let is overall a better example.

Finally, this particular kwargs hack fixes an asymmetry in the API; if you wrote mac(x=5)[...], which is very pythonic, it would be silently ignored, whereas mac(5) would place the 5 into args (as a Num).

Whether an args-like arcane mechanism is needed at all is another question. If it can be axed altogether, that would simplify things. I didn't see this angle before.

In the context of the new .transform, do you have a proposal (idea, not code) on how to handle named args from the use site? Normal run-time code obviously won't call mac.transform(tree, *args, **kwargs); it will invoke the macro as mac(a0, ..., an, k0=v0, ..., km=vm)[...] (hypothetical syntax if named args are allowed).

I think some isolation is needed; it is perfectly valid to define a let variable called gen_sym or similar, and it shouldn't conflict with MacroPy internals. Similarly, the let should not always bind a gen_sym, just because that name happens to exist in **kw. For let (and similarly for λ), there needs to be a way to tell apart user-given vs. MacroPy internal kwargs. Shadowing is only a partial solution, ideally both definitions (if present) should be accessible in the macro code.

Technologicat commented 6 years ago

Since I promised "neat new stuff", here's also an example on λ (all safeties stripped):

@macros.expr
def λ(tree, args, kwargs, **kw):  # <-- requires the kwargs hack
    withdefault_names = [k.arg for k in kwargs]
    defaults = [k.value for k in kwargs]
    names = [k.id for k in args] + withdefault_names
    newtree = do.transform(tree)
    lam = q[lambda: ast_literal[newtree]]
    lam.args.args = [arg(arg=x) for x in names]
    lam.args.defaults = defaults  # for the last n args
    return lam

@macros.expr
def do(tree, **kw):
    ... # beside the point; see unpythonic.syntax

Usage:

echo = λ(myarg="hello")[print(myarg),
                        myarg]
assert echo() == "hello"
assert echo("hi") == "hi"

count = let((x, 0))[
          λ()[x << x + 1,
              x]]
assert count() == 1
assert count() == 2

myadd = λ(x, y)[print("myadding", x, y),
                localdef(tmp << x + y),
                print("result is", tmp),
                tmp]
assert myadd(2, 3) == 5

The essential point is, kwargs is used to capture keyword nodes from the use site, where arg is the name and value is the AST node representing the value. These can be abused as an args-with-defaults declaration. (In a call, named args after positionals; in a function declaration, args with defaults last. Isomorphic, or close enough.)

No *args or **kwargs support yet, but in 3.5, not difficult to add. Just sanity-check there is at most one of * and ** each, and check placement w.r.t. other args. Extending this slightly could give support also for only-by-name args.

[edit] The count example requires let from unpythonic.syntax; the one posted above is there called simple_let and doesn't support assignments. Supporting an "assignment expression" requires some trickery which is beside the point here.

Technologicat commented 6 years ago

I just obsoleted my λ; this is much more pythonic, not to mention less brittle:

@macros.block
def multilambda(tree, **kw):
    @Walker
    def transform(tree, *, stop, **kw):
        if type(tree) is not Lambda or type(tree.body) is not List:
            return tree
        bodys = Tuple(elts=tree.body.elts, ctx=Load())
        bodys = copy_location(bodys, tree)
        stop()  # don't recurse over what do[] does
        bodys = transform.recurse(bodys)  # but recurse over user code
        tree.body = do.transform(bodys)
        return tree
    yield transform.recurse(tree)

Usage:

with multilambda:
    echo = lambda x: [print(x), x]
    assert echo("hi there") == "hi there"

    count = let((x, 0))[
              lambda: [x << x + 1,
                       x]]
    assert count() == 1
    assert count() == 2

    t = lambda: [[1, 2]]
    assert t() == [1, 2]

The pythonic let use case still stands; there named arguments would be useful.

catb0t commented 6 years ago

sorry -- this has little to do with the interesting discussion of late, but i wonder about the contrived code example in the first comment

i don't understand how this works in 2 ways, even in custom MacroPy or unpythonic.

x and y are not known names at the point λ is called. the comment about needing named arguments is later in the code, so surely λ(x, y) works but i don't know how.

myadd = λ(x, y)[print(x, y), x + y]

λ is a function called with two un-assigned symbols which returns a function-like callable indexable object, this is fine.

but as well print(x, y) is unavoidably evaluated as soon as it is seen by the Python interpreter and so the slice / indexing object becomes [None, x + y].

Perhaps from multilambda import macros, λ puts all code after it in a big try...except NameError: block?

And the import also enables some "lazy-loading" feature of the Python interpreter so that print(x, y) is not evaluated to None immediately? Or did you mean to write lambda: print(x, y)?

Technologicat commented 6 years ago

Cat: it's a macro thing. :)

Roughly speaking, a macro intercepts and transforms code before the rest of the interpreter even sees it. It just needs to be valid syntactically, so that Python's parser can convert the source code text to an AST. MacroPy then hands over (relevant parts of) this AST to macros, to be transformed into a new AST. Normal run-time interpretation starts only after all macros have "run" (been expanded). This gives some flexibility normal code doesn't have.

The λ is a macro; it looks like a function call, but it's subtly different. The [...] are part of the macro syntax in MacroPy; they delimit the body, i.e. the main stuff that goes in. The (...) delimit macro arguments (args) - these are also ASTs, just placed inside (...) instead of [...].

Both args and body are sent to the same "call" of the macro. Hence, λ(arg0, ...)[body0, ...] is just one operation, not two. The undefined names are never seen by the interpreter - they are transformed into argument names in a lambda.

Now, unpythonic does a bit of magic - the body of λ gets wrapped with an unpythonic.seq.do, which (in its normal runtime code part) takes a list of regular old Python lambdas, and runs them one by one. (There are some technical details to support variables local to the "do", beside the point here.)

The "do" macro, which is what the λ macro actually inserts, then makes this a bit easier to use, by taking the code entered by the user, and wrapping each item in a lambda - automatically - so that execution is delayed until the underlying unpythonic.seq.do actually runs.

Hope this helps :)

[edit] fix text formatting

catb0t commented 6 years ago

Yes, it helps very much! I sort of thought Python tries to resolve names at parse time and complain at runtime, but this is all very interesting to learn :)

Technologicat commented 6 years ago

Cat: AFAIK, Python basically resolves everything at runtime. Only reserved words such as import and def always mean what we expect them to; almost anything else can be overridden (either by rebinding the original or shadowing it by something more local) from anywhere at any time. :)

I've sometimes tripped over this myself, when writing a context manager, declaring

    def __exit__(self, type, value, traceback):
        ...

and then wondered why a call to the built-in type() from inside that method fails to work. :)