Pattern matching for expressions

MikeInnes commented 9 years ago

Proposal: I'd like to incorporate some form of pattern matching for expressions, in particular ExpressionMatch.jl. I'd love for the macro-writers among you to give feedback and help me hash out the API and implementation.

Why `@match`

Working with Julia Expr objects is a pain, requiring an encyclopedic knowledge of implementation details to do the right thing and handle strange edge cases. Pattern matching provides an answer: expressions are described as they are written, so you don't have to remember two representations of everything.

@tkelman points out:

Code that uses pattern matching looks neat, but to me it comes across as a bit of a too-clever novelty syntax pun, and people who've never seen or used pattern matching before (which I'll postulate is the vast majority of Julia users) are going to have a really hard time understanding what's going on.

I think this is a very fair point, and if we were talking about implementing pattern matching over general Julia types I might agree. But working with expressions already has a learning curve above and beyond general types; in order to read or write a macro one must know or work out how every type of expression is internally specified. And that's before you get to the confusing details like the difference between Expr(:quote, :x) and QuoteNode(:x) (can you remember which is created by :(Foo.bar) and @m(Foo.bar), I wonder?), which expressions create extraneous begin blocks for a single element, etc. etc. I've written a ton of macros and these things still trip me up every time – people who haven't spent years banging their heads against Expr objects (which I'll postulate is the vast majority of Julia users) are going to have a really hard time understanding what's going on.

Consider the readme example:

julia> ex = quote
         type Foo
           x::Int
           y
         end
       end

julia> if isexpr(ex.args[2], :type)
         (ex.args[2].args[2], ex.args[2].args[3].args)
       end
(:Foo,{:( # line 3:),:(x::Int),:( # line 4:),:y})

julia> @match ex begin
         type T_
           fields__
         end -> (T, fields)
       end
(:Foo,{:(x::Int),:y})

@match is probably a bit surprising the first time you see it, but that's a one-off cost – it only has to click once and you understand it every time it's used. The syntax x.args[2].args[3].args may not surprise you initially, but what about its intent? You have to wade through it every time you see it.

Hell, you've only seen it once, and I'll bet you find it much easier to see what's going on here than here. The former adds new features and handles a bunch of edge cases better – can you tell me which ones? (30 seconds, go!) Of course, it's possible to read it in the same way it's possible to read assembly code, but that doesn't mean we have to work with it.

This is a tradeoff, like any other design decision, but it's clear to me at least which side I'd rather be on. If you had to remember how a sparse matrix was implemented every time you used it, we'd never settle for it.

The Case for Base

The doc system snippet above is the shining example of a use case of this, because we want to dispatch on a bunch of different syntax. Checking for things like :(:(Base.@time)) (yes, double-quoted – how is that represented internally again?) and getting the edge cases right is a nightmare. With @match, it's crystal clear what is and isn't supported by @doc, and you don't have to be a macro wizard to suggest or implement new ideas – the accessibility of the code is greatly improved, and it's far from a novelty trick. If people can get behind this then I imagine it wouldn't hurt to have "real" support as part of the standard library, too, and there are plenty of other cases – in Base and outside of it – where macros could be improved via this syntax.

tkelman commented 9 years ago

Thanks for opening this.

If you had to remember how a sparse matrix was implemented every time you used it, we'd never settle for it.

Bad example. You do have to remember how a sparse matrix is implemented if you're going to write a fast algorithm for working with them. Same with macros, you don't need to know things work to just use them if you trust the implementation (which, for @match, I don't yet), only to write them.

I think the biggest thing I dislike about this implementation is the syntactically significant underscores. That's what makes this feel too-clever. And how well-tested is ExpressionMatch.jl? You say you're removing corner cases, when you're just moving the complexity to the macro implementation which has no tests of its own (yet). If this weren't brand new, if it were solidly tested, widely used code by dozens of packages for their own macros, I'd be sold more easily.

people who haven't spent years banging their heads against Expr objects (which I'll postulate is the vast majority of Julia users) are going to have a really hard time understanding what's going on.

A huge part of that is the lack of formal AST documentation.

yuyichao commented 9 years ago

I like the argument of not having to figure out the internal structure of a AST especially since the AST is not documented and is more or less intentionally kept that way because we are breaking it from time to time.

Here are some of my concerns/questions.

From your example, it seems that you can both match a single expression or a list of it, How can someone figure that out and in general how well is this documented. I'm especially concern about this in case of a AST change since people will think this is a public API that can be relied on.
I noticed that you filter out the line number node. I agree that this is probably what one would want most of the time. However, I think having them is sometimes very important for getting good debug infos.

yuyichao commented 9 years ago

@tkelman Just noticed that you forgot to hit your y before pasting the link =)

tkelman commented 9 years ago

Indeed, partly on purpose, so Mike can make me look bad :)

MikeInnes commented 9 years ago

You do have to remember how a sparse matrix is implemented if you're going to write a fast algorithm for working with them. Same with macros, you don't need to know things work to just use them...

Except I'm talking about working with Exprs, not macros. Sure you still need to know how matrices are implemented occasionally, just not for every algorithm that uses them – you do with Exprs. It's not really worth debating where the analogy does or doesn't work, though.

It being very recent is a reasonable concern – it'd be good to hash out the details a bit more in practice, which is partly why I hoped to leave it as an implementation detail of the doc system for now. I should point out that the doc system itself uses every feature @match has, however – can we get coverage data for PRs? – so the standard julia build and tests already provide a pretty stringent test of robustness, even if the package itself is lacking.

yuyichao commented 9 years ago

A huge part of that is the lack of formal AST documentation.

Yeah, this is the reason I have a mixed feeling about this (as you can probably tell from my previous post). On one hand, having @match will expose a more or less nice interface for macro writers. On the other hand, I'm wondering how reliable it would be (document / testing / breakage).

MikeInnes commented 9 years ago

@yuyichao Documentation is always good and definitely doable. I think that expressions changing under the hood is probably a great reason to abstract away the details – using @match will mean that changes to the AST break less code.

I only filter out the line numbers when using a slurp x__, so it's entirely possible to keep them if you want them. Of course, you can still drop down to the lower-level interface if you want.

tkelman commented 9 years ago

You can use CoverageBase on a local build.

Isn't writing (or debugging) a macro pretty much the only time you'd ever need to work with Expr objects? I've used Exprs outside of macros in the context of MathProgBase's expression-tree API and found working with them was fine and didn't really call for a DSL, though at least those expressions were fairly uniform and simple in nature.

I think we're past the point where we can get away with sneaking undocumented, under-tested features into base as implementation details of other code without review and discussion (e.g. Stefan and a few others have looked back with regret on the documentation system being tied to an entire markdown parser, in retrospect would've liked a more modular approach that didn't bring in as much new code)

MikeInnes commented 9 years ago

The documentation system isn't tied to an entire markdown parser. Having Markdown.jl in Base was always meant to be temporary pending more infrastructure to bundle packages with Base, and it'll be easy enough to split out once we have that. If you don't think that's reasonable, feel free to start a discussion on it.

kmsquire commented 9 years ago

My two cents:

I do happen to like pattern matching, and I would like to see it in Base.
I think pattern matching on expressions is a very good use case (and I've used Match.jl for that)
I'm not a big fan of using underscores to indicate variable capture. It probably simplifies the parsing, but I would prefer straight variable capture. (I'd really prefer to use $var, to be analogous with splicing, but that would require first class support in the parser.)
It would be unfortunate for Match.jl if @match were exported from Base (and I looked closely at the PR to see if that was the case, before @tkelman brought it up), unless it offered most of the functionality of Match.jl and could simply replace it.

mbauman commented 9 years ago

+1 for using $var and $(var...) to denote variable capture and "slurping". Although it's backwards from its real purpose, I think that it makes some of the underscore magic that I don't like more obvious. I think it could work just fine without any parsing changes:

julia> macro foo(ex)
           @show ex
           nothing
       end

julia> @foo begin
           type $T
              $(fields...)
           end -> (T, fields)
       end
ex = quote  # none, line 2:
    type $(Expr(:$, :T)) # none, line 3:
            $(Expr(:$, :((fields...,))))
        end->begin  # none, line 4:
            (T,fields)
        end
end

And with that, this starts feeling a little more first-class.

kmsquire commented 9 years ago

@mbauman, nice!

MichaelHatherly commented 9 years ago

I came here to suggest what @kmsquire and @mbauman have already done with using $var instead of underscores, so +1 for that syntax.

@one-more-minute you've also got _Expr syntax as well which seems to be for restricting the type of the match (?), using $ and :: for that, such as $(::Expr), seems quite nice.

mbauman commented 9 years ago

On the downside, it does add a bit of syntax noise:

    @match def begin
        ($f($(_...)) = $_)          -> funcdoc(meta, def)
        function $f($(_...)) $_ end -> funcdoc(meta, def)
        function $f end             ->  objdoc(meta, def)
        macro $m($(_...)) $_ end    -> namedoc(meta, def, symbol("@", m))
        type $T $_ end              -> typedoc(meta, def, namify(T))
        immutable $T $_ end         -> typedoc(meta, def, namify(T))
        (abstract $T)               -> namedoc(meta, def, namify(T))
        (bitstype $_ $T)            -> namedoc(meta, def, namify(T))
        (typealias $T $_)           ->  objdoc(meta, def)
        module $M $_ end            -> namedoc(meta, def, M)
        $(_::Expr)                  -> error("Unsupported @doc syntax $def")
        $_                          -> objdoc(meta, def)
    end

As an aside, ~~why do you need to "slurp" function arguments and macro arguments, but not type fields or the expressions in functions and modules?~~ Oh, right, because the latter are single blocks.

ssfrr commented 9 years ago

as a random data point I was pretty confused about all the underscores but the $ syntax made it much more clear what's going on.

In the cases where $f and $T are captured but not used in the matching expression (e.g. , (typealias $T $_) -> objdoc(meta, def) could those be replaced with $_ or am I missing something?

catawbasam commented 9 years ago

Maybe @matchmeta or @matchexpr ?

MichaelHatherly commented 9 years ago

could those be replaced with $_ or am I missing something?

Yes, objdoc doesn't require the typealias name directly so $T could be switched to $_.

ssfrr commented 9 years ago

@MichaelHatherly Thanks - I wanted to make sure I did in fact understand what was going on. :)

fcard commented 9 years ago

Some caution has to be had with $, however. For example, type $T $_ end parses to type (($T) $ (_)); end, not type ($T); ($_) end.

I have implemented something like this in the past, and from my usage of it I found it more useful to have a marking to indicate "match this with the expression literally" rather than one to indicate "this is a binding name", since binding names are much more common. Like so:

@match :(add(1,2)) begin
    (add::!)(x,y) -> x + y   # will return 3
    (sub::!)(x,y) -> x - y   # would return -1
           f(x,y) -> (f,x,y) # would return (:add, 1, 2)
end

I used (x::!) in my implementation, but honestly that's just because I couldn't think of anything else. In the case of "slurps", I just used x... and had support for x::? for reversing "literal" syntax, so one could write:

@match :[a,b,c...] begin
    (a,b,c...)           -> c # this returns [:(c...)]
    (a,b,((c::?)...)::!) -> c # this returns :c
end

Which is pretty ugly looking, but in my opinion it's better to have a few verbose corner cases with simplified common cases, than to have everything be equal but noisy. (although the punning of the :: syntax + its verbosity was one of the reasons I abandoned my implementation in the first place, but I'd like to throw the idea out there so that maybe someone can improve upon it)

(Other features that I had)

As for why I want it, apart from making many types of macros much easier to create, pattern matching has the potential to add automatic, descriptive error messages without the user having to add a bunch of asserts testing the heads of the Exprs and the shape of the args.

One cons I found is that pattern matching encourages the user to write complex, clever expressions when simpler ones would suffice. But if we permit macros, we should already be expecting some degree of self-control.

MikeInnes commented 9 years ago

It's interesting to hear, and useful to know, that people find the $ syntax more intuitive, although unfortunately I think the issue @fcard points out is a show stopper. Indeed, the main reason that underscores work well is exactly because they aren't significant usually – they're guaranteed not to affect the parsing in unintuitive ways, and they won't be ambiguous with any other expression you'd want to match.

@fcard – you should definitely put up a package if you can, I'd love to see your code. Swapping around literal and bound symbols is an interesting idea, although I'm not sure about the idea of writing :(2+2) as (+::!)(2, 2). (Though I guess you could restrict bindings to isidentifier symbols.)

MikeInnes commented 9 years ago

I think it'll be a lot easier to hash this stuff out in practice, so I'll just continue working on this as a package for now.

MikeInnes commented 9 years ago

What do people think of the quoting idea raised by @MichaelHatherly? e.g.

function docm(meta, def)
    @match def begin
        :(:(@m_)) -> return objdoc(meta, m)
        :(m_"") -> return objdoc(meta, m)
    end
    def = macroexpand(def)
    @match def begin
        :(f_(__) = _)            -> funcdoc(meta, def)
        :(function f_(__) _ end) -> funcdoc(meta, def)
        :(function f_ end)       ->  objdoc(meta, def)
        :(macro m_(__) _ end)    -> namedoc(meta, def, symbol("@", m))
        :(type T_ _ end)         -> typedoc(meta, def, namify(T))
        :(immutable T_ _ end)    -> typedoc(meta, def, namify(T))
        :(abstract T_)           -> namedoc(meta, def, namify(T))
        :(bitstype _ T_)         -> namedoc(meta, def, namify(T))
        :(typealias T_ _)        ->  objdoc(meta, def)
        :(module M_ _ end)       -> namedoc(meta, def, M)
        :_Expr                   -> error("Unsupported @doc syntax $def")
        :_                       -> objdoc(meta, def)
    end
end

I was coming round to the idea after typing the quotes accidently myself a couple times, but seeing it like that I'm not so sure.

MichaelHatherly commented 9 years ago

Yeah, it starts to lose quite a bit of elegance. Along with some kind of $ binding syntax it starts to make writing ex.args[1].args[1].args-type macros look more appealing in comparision...

If the macro was named @matchexpr like @catawbasam suggested in https://github.com/JuliaLang/julia/issues/12102#issuecomment-120535457 then the quotes wouldn't be necessary I think.

MikeInnes commented 9 years ago

@yuyichao Can you think of any breaking changes to the AST which have happened since v0.3 or v0.2? Hopefully with a good example I can prove the concept.

yuyichao commented 9 years ago

What I'm thinking about is sth like https://github.com/JuliaLang/julia/issues/9503 (which did not happen. At least not yet.) I think I agree in general @match will have less breakage but I'm just not very sure how reliable it is (without testing/documentation).

There are also other AST/syntax changes in 0.4. More flexible kw arguments: https://github.com/JuliaLang/julia/issues/7704, =>, Pair and Dict: https://github.com/JuliaLang/julia/issues/6739, Union{}: https://github.com/JuliaLang/julia/pull/11432.

Some of them are surface syntax changes that might be out of the scope of @match to provide compatibility (or not?). Without a syntax change, I guess @match should work pretty well, though the question is still "how well".

With a surface syntax change, the current way to solve the compatibility issue is using @compat. Can @match play well with @compat (in any order) or does it need to provide compatibility by itself?

thautwarm commented 5 years ago

@kmsquire , @mbauman , @MichaelHatherly , @ssfrr It's so much exciting to find so many people have the same idea with me 3 years ago!!! I've already implemented this one, which is based on the homoiconicity of the pattern and the value to be matched, and strongly recommend you to read the doc for this: https://thautwarm.github.io/MLStyle.jl/latest/syntax/pattern/#Ast-Pattern-1

The performance could be amazing for it's purely static code generation.

using MLStyle
rmlines = @λ begin
    e :: Expr           -> Expr(e.head, filter(x -> x !== nothing, map(rmlines, e.args))...)
      :: LineNumberNode -> nothing
    a                   -> a
end
expr = quote
    struct S{T}
        a :: Int
        b :: T
    end
end |> rmlines

@match expr begin
    quote
        struct $name{$tvar}
            $f1 :: $t1
            $f2 :: $t2
        end
    end =>
    quote
        struct $name{$tvar}
            $f1 :: $t1
            $f2 :: $t2
        end
    end |> rmlines == expr
end

JuliaLang / julia