Make tilde automatically quote its arguments

johnmyleswhite commented 10 years ago

In the past we've talked about making the tilde operator do something special to make statistical functions look nicer. One simple approach would take ex1 ~ ex2 and automatically wrap it in a quote call.

This would allow us to change clunky interfaces like glm(:(y ~ x)) into the nicer glm(y ~ x).

I'd love to see something like this happen for the 0.3 release, since it will allow us to provide a cleaner (and more familiar interface) for a lot of statistical functions. I suspect it wouldn't even be a badly breaking change, since I'm not aware of anyone using the tilde operator except people doing statistics in Julia.

StefanKarpinski commented 10 years ago

I'm on board with this. I think that what it should do is invoke the @tilde macro in the current scope, which can then construct a Formula object or whatever else.

johnmyleswhite commented 10 years ago

What's the current scope defined as? If I give a definition in DataFrames of @tilde, then someone else gives a definition in Package P, and I call glm(y ~ x) in Main, where does it go?

Keno commented 10 years ago

Depends on the usual scoping rules I guess. Maybe rewrite y ~ x to ~(:(x),:(y)) so that y DataFrames.~ x could become DataFrames.~(:(x),:(y)), though now that I'm writing this out, I'm not sure it's a great idea.

Keno commented 10 years ago

or rather @~

StefanKarpinski commented 10 years ago

Oops. Sorry.

toivoh commented 10 years ago

I use ~ in PatternDispatch.jl to make two patterns match the same thing, e.g.

@pattern f(v ~ [x]) = (v,x)  # matches a single-element vector v, binding x to the element

But that usage is restricted to function signatures that are wrapped with the @pattern macro, so if x ~ y is replaced by @tilde(x, y) it shouldn't be a problem for me to adapt the parsing. I guess that the precedence would be the same?

When it comes to how the right @tilde macro would end up in the local scope, I see two possibilities:

Dataframes exports @tilde, so you get it with using Dataframes. Using x ~ y without Dataframes (or defining @tilde by some other means) produces an error.
Base exports @tilde and implements it to call a tilde function, that Dataframes can overload.

I would lean toward the former. The latter feels needlessly complex and too much like monkey patching.

JeffBezanson commented 10 years ago

I could parse this as Expr(:~, x, y), and then lower that by default to a macro call to @tilde or @~. Or it could just be parsed as a macro call to @~.

toivoh commented 10 years ago

@JeffBezanson: When would it be lowered? After the AST has been converted from surface syntax?

kmsquire commented 10 years ago

I would vote for parsing as Expr(:~, x, y), so that (I assume) non-macro uses are possible.

JeffBezanson commented 10 years ago

The macro expander would have to treat Expr(:~, x, y) as a macro call to @~ if it encountered such an expression (i.e. no macro transformed it first).

toivoh commented 10 years ago

@kmsquire: Non-macro uses should be possible by defining your @tilde macro like e.g.

macro tilde(x,y)
    esc(:( tilde($x, $y) ))
end

which makes @tilde(x,y) reduce to tilde(x, y).

kmsquire commented 10 years ago

Thanks, Toivoh and Jeff.

johnmyleswhite commented 10 years ago

If people are happy with DataFrames being "in charge" of @tilde and then requiring that other tools like PatternMatching.jl operate at macro time, that seems like an alright solution. I'm a little worried that someone will want to change how it gets interpreted and end up pulling the whole thing down, but that's probably just paranoia.

toivoh commented 10 years ago

Thinking about this a little more, Debug.jl would need to be able to expand a tilde expression with macroexpand. I think the simplest would be if ~ were just replaced with a @tilde invocation, then the AST could be handled just like now (by Debug.jl and others). I'm not sure what advantage it would bring to introduce a new Expr(:~, ...) type.

johnmyleswhite commented 10 years ago

Sorry for being dense, but Debug.jl uses tilde internally, not doesn't export it? If not, how would I use both DataFrames and Debug?

toivoh commented 10 years ago

No problem, I realize that I wasn't very clear. Debug.jl doesn't use tilde at all, but I want it to be able to debug code that does. To do that, it has to expand all macros in instrumented code to make sure that it doesn't miss any variable declarations or possible trap points. I guess that the DataFrames definition of @tilde would not generate either, but if some other package defines @tilde differently, it might. It's a corner case, but I hate to leave gotchas in my code that I know about.

johnmyleswhite commented 10 years ago

Thanks for helping me understand. I guess my original concern that people may step on each other's toes still holds, but I'd rather we move forward with something like @tilde than do nothing. It really will make the stats code a lot more enjoyable to write.

JeffBezanson commented 10 years ago

I can add this easily. What should the associativity be? Should x~y~z be ~(~(x,y),z), ~(x,~(y,z)), or ~(x,y,z)?

dmbates commented 10 years ago

In model formulas there are very few cases of multiple tildes and I don't think the issue would arise. I found a use for a y ~ f(x) ~ A + b syntax in R once but I wouldn't design that code the same way in Julia.

If a decision is needed I would vote for x ~ y ~ z being equivalent to ~(x,y,z).

Keno commented 10 years ago

What are we gonna do with the current use of ~ (boolean negation)?

JeffBezanson commented 10 years ago

It will stay the same.

StefanKarpinski commented 10 years ago

How a out varargs parsing instead of associative binary?

JeffBezanson commented 10 years ago

That's the third option above.

johnmyleswhite commented 10 years ago

Sweet! Thank you!

-- John

On Jan 10, 2014, at 5:25 PM, Jeff Bezanson notifications@github.com wrote:

Closed #4882 via a007350.

— Reply to this email directly or view it on GitHub.

StefanKarpinski commented 10 years ago

Oh, right. That then.

cdsousa commented 10 years ago

Is this feature somehow reserved? Is it documented? Is the "overloading" of this macro acceptable for uses other than to create Formula objects, when the DataFrames package is not used? E.g.,

macro ~(d,k)
    :($d[$(Meta.quot(k))])
end

> mydict = [:x => 123, :y => 456]
> mydict~x
123

I guess the answer is that it must be reserved, but I would like to be sure.

JeffBezanson commented 10 years ago

No, it's not reserved for DataFrames. You simply get whatever definition of @~ is visible.

johnmyleswhite commented 10 years ago

That said, if you use this in a different way, you should probably advertise that your package is not compatible with DataFrames, since it would break things like GLM.

Having worked with this for a while, I think it would be reasonable to have it always return a type that behaves like Formula, which is effectively just a sequence of two-quoted expressions. Then you can easily allow different functions to use multiple dispatch to give different semantics to that Formula type.

cdsousa commented 10 years ago

Thanks for the answers. I'm not planning to use it in any way, that was just to clarify my view of the language :) Thanks.

tkelman commented 9 years ago

Is there a good reason this couldn't have been done as @glm(y ~ x) from the beginning? Macro parsing of ~ is a pretty fishy special case to have hiding in the language IMO.

ScottPJones commented 9 years ago

@tkelman, I brought this up also... and was shot down... but I still think this deserves a breaking change to stick the ~ in a macro precisely as you described instead of it being a special case macro.

johnmyleswhite commented 9 years ago

@tkelman, I would support getting rid of the specialized parsing of ~ in a future Julia release.

tkelman commented 9 years ago

That is good to know, thanks. Is DataFrames the only package that currently has an implementation of the @~ macro? Looking into it a bit, it looks like you actually want something that creates a Formula type which various other functions like lm would operate on, so it may be better in the end to have a dedicated macro that outputs a formula object rather than changing all the fitting routines into macros. Would need to learn more about how it all works currently.

johnmyleswhite commented 9 years ago

I think every package that does linear regression uses that notation, so it likely affects MixedModels and NLReg as well.

andyferris commented 8 years ago

I just today found out about this is oddball feature. Have people thought about the future of this, lately?

I wonder if we could have @~ defined in base to go to some overloadable function, so it can be shared between many packages?

vtjnash commented 8 years ago

@~ is an overloadable function, there's just not much useful for it to dispatch on, so realistically you can only import it from one package at a time.

andyferris commented 8 years ago

Hmm... OK I was just speculating.

For me - it would be nice to have something both aware of expressions and of the types of things around it. Or have general infix macros, or something.

Otherwise, as a single special case, this seems rather unlike the rest of the language.

andyferris commented 8 years ago

For me - it would be nice to have something both aware of expressions and of the types of things around it.

E.g. This might be something like a generator: MyType(a ~ b) might do something while YourType(a ~ b) might mean something quite different.

tkelman commented 8 years ago

We're planning on getting rid of this for 0.6. It requires preparing a more julian implementation of the formula dsl in juliastats packages.

andyferris commented 8 years ago

Ok thanks for the update, Tony!

JuliaLang / julia

Make tilde automatically quote its arguments #4882