Closed johnmyleswhite closed 10 years ago
I'm on board with this. I think that what it should do is invoke the @tilde
macro in the current scope, which can then construct a Formula object or whatever else.
What's the current scope defined as? If I give a definition in DataFrames of @tilde
, then someone else gives a definition in Package P, and I call glm(y ~ x)
in Main, where does it go?
Depends on the usual scoping rules I guess. Maybe rewrite y ~ x
to ~(:(x),:(y))
so that y DataFrames.~ x
could become DataFrames.~(:(x),:(y))
, though now that I'm writing this out, I'm not sure it's a great idea.
or rather @~
Oops. Sorry.
I use ~
in PatternDispatch.jl to make two patterns match the same thing, e.g.
@pattern f(v ~ [x]) = (v,x) # matches a single-element vector v, binding x to the element
But that usage is restricted to function signatures that are wrapped with the @pattern
macro, so if x ~ y
is replaced by @tilde(x, y)
it shouldn't be a problem for me to adapt the parsing. I guess that the precedence would be the same?
When it comes to how the right @tilde
macro would end up in the local scope, I see two possibilities:
Dataframes
exports @tilde
, so you get it with using Dataframes
. Using x ~ y
without Dataframes
(or defining @tilde
by some other means) produces an error.Base
exports @tilde
and implements it to call a tilde
function, that Dataframes
can overload.I would lean toward the former. The latter feels needlessly complex and too much like monkey patching.
I could parse this as Expr(:~, x, y)
, and then lower that by default to a macro call to @tilde
or @~
. Or it could just be parsed as a macro call to @~
.
@JeffBezanson: When would it be lowered? After the AST has been converted from surface syntax?
I would vote for parsing as Expr(:~, x, y)
, so that (I assume) non-macro uses are possible.
The macro expander would have to treat Expr(:~, x, y)
as a macro call to @~
if it encountered such an expression (i.e. no macro transformed it first).
@kmsquire: Non-macro uses should be possible by defining your @tilde
macro like e.g.
macro tilde(x,y)
esc(:( tilde($x, $y) ))
end
which makes @tilde(x,y)
reduce to tilde(x, y)
.
Thanks, Toivoh and Jeff.
If people are happy with DataFrames being "in charge" of @tilde
and then requiring that other tools like PatternMatching.jl operate at macro time, that seems like an alright solution. I'm a little worried that someone will want to change how it gets interpreted and end up pulling the whole thing down, but that's probably just paranoia.
Thinking about this a little more, Debug.jl
would need to be able to expand a tilde expression with macroexpand
. I think the simplest would be if ~
were just replaced with a @tilde
invocation, then the AST could be handled just like now (by Debug.jl
and others). I'm not sure what advantage it would bring to introduce a new Expr(:~, ...)
type.
Sorry for being dense, but Debug.jl uses tilde internally, not doesn't export it? If not, how would I use both DataFrames and Debug?
No problem, I realize that I wasn't very clear. Debug.jl doesn't use tilde at all, but I want it to be able to debug code that does. To do that, it has to expand all macros in instrumented code to make sure that it doesn't miss any variable declarations or possible trap points. I guess that the DataFrames definition of @tilde
would not generate either, but if some other package defines @tilde
differently, it might. It's a corner case, but I hate to leave gotchas in my code that I know about.
Thanks for helping me understand. I guess my original concern that people may step on each other's toes still holds, but I'd rather we move forward with something like @tilde
than do nothing. It really will make the stats code a lot more enjoyable to write.
I can add this easily.
What should the associativity be? Should x~y~z
be ~(~(x,y),z)
, ~(x,~(y,z))
, or ~(x,y,z)
?
In model formulas there are very few cases of multiple tildes and I don't think the issue would arise. I found a use for a y ~ f(x) ~ A + b
syntax in R once but I wouldn't design that code the same way in Julia.
If a decision is needed I would vote for x ~ y ~ z
being equivalent to ~(x,y,z)
.
What are we gonna do with the current use of ~
(boolean negation)?
It will stay the same.
How a out varargs parsing instead of associative binary?
That's the third option above.
Sweet! Thank you!
-- John
On Jan 10, 2014, at 5:25 PM, Jeff Bezanson notifications@github.com wrote:
Closed #4882 via a007350.
— Reply to this email directly or view it on GitHub.
Oh, right. That then.
Is this feature somehow reserved? Is it documented? Is the "overloading" of this macro acceptable for uses other than to create Formula objects, when the DataFrames package is not used? E.g.,
macro ~(d,k)
:($d[$(Meta.quot(k))])
end
> mydict = [:x => 123, :y => 456]
> mydict~x
123
I guess the answer is that it must be reserved, but I would like to be sure.
No, it's not reserved for DataFrames. You simply get whatever definition of @~
is visible.
That said, if you use this in a different way, you should probably advertise that your package is not compatible with DataFrames, since it would break things like GLM.
Having worked with this for a while, I think it would be reasonable to have it always return a type that behaves like Formula
, which is effectively just a sequence of two-quoted expressions. Then you can easily allow different functions to use multiple dispatch to give different semantics to that Formula
type.
Thanks for the answers. I'm not planning to use it in any way, that was just to clarify my view of the language :) Thanks.
Is there a good reason this couldn't have been done as @glm(y ~ x)
from the beginning? Macro parsing of ~
is a pretty fishy special case to have hiding in the language IMO.
@tkelman, I brought this up also... and was shot down... but I still think this deserves a breaking change to stick the ~
in a macro precisely as you described instead of it being a special case macro.
@tkelman, I would support getting rid of the specialized parsing of ~
in a future Julia release.
That is good to know, thanks. Is DataFrames the only package that currently has an implementation of the @~
macro? Looking into it a bit, it looks like you actually want something that creates a Formula
type which various other functions like lm
would operate on, so it may be better in the end to have a dedicated macro that outputs a formula object rather than changing all the fitting routines into macros. Would need to learn more about how it all works currently.
I think every package that does linear regression uses that notation, so it likely affects MixedModels and NLReg as well.
I just today found out about this is oddball feature. Have people thought about the future of this, lately?
I wonder if we could have @~ defined in base to go to some overloadable function, so it can be shared between many packages?
@~
is an overloadable function, there's just not much useful for it to dispatch on, so realistically you can only import it from one package at a time.
Hmm... OK I was just speculating.
For me - it would be nice to have something both aware of expressions and of the types of things around it. Or have general infix macros, or something.
Otherwise, as a single special case, this seems rather unlike the rest of the language.
For me - it would be nice to have something both aware of expressions and of the types of things around it.
E.g. This might be something like a generator: MyType(a ~ b)
might do something while YourType(a ~ b)
might mean something quite different.
We're planning on getting rid of this for 0.6. It requires preparing a more julian implementation of the formula dsl in juliastats packages.
Ok thanks for the update, Tony!
In the past we've talked about making the tilde operator do something special to make statistical functions look nicer. One simple approach would take
ex1 ~ ex2
and automatically wrap it in a quote call.This would allow us to change clunky interfaces like
glm(:(y ~ x))
into the nicerglm(y ~ x)
.I'd love to see something like this happen for the 0.3 release, since it will allow us to provide a cleaner (and more familiar interface) for a lot of statistical functions. I suspect it wouldn't even be a badly breaking change, since I'm not aware of anyone using the tilde operator except people doing statistics in Julia.