Open Aerlinger opened 10 years ago
I'm sympathetic for the need for a more incremental way of building complex plots. Of course ggplot2 has this with an overloaded addition operator so you can do things like,
p = plot(df, aes(x, y))
p <- p + geom_line()
I didn't copy that, not because I don't like defining plots incrementally, but rather I don't like commandeering arithmetic operators for non-arithmetic operations.
I like the gist of the syntax you're proposing, but I'm not sure how it could be implemented. With Julia's do
syntax, this would get translated into something like:
p = plot(() -> begin
layer(df_cos, x=:x, y=:y, color="blue", Geom.line, Geom.ribbon)
layer(df_sin, x=:x, y=:y, color="yellow", Geom.line, Geom.ribbon)
end)
Only the last layer will be returned, with no way to associate the first layer with the plot.
Here are two other possibilities:
Define a @plot
macro that evaluates a bunch of statements in a block and splices them into a plot(...)
call. So we'd do something like:
p = @plot begin
layer(df_cos, x=:x, y=:y, color="blue", Geom.line, Geom.ribbon)
layer(df_sin, x=:x, y=:y, color="yellow", Geom.line, Geom.ribbon)
end
Define push!
method over plots so they can built incrementally.
p = plot()
push!(p, layer(df_cos, x=:x, y=:y, color="blue", Geom.line, Geom.ribbon))
push!(p, layer(df_sin, x=:x, y=:y, color="yellow", Geom.line, Geom.ribbon))
+1 for push!
I think the @plot begin ... end
seems like a very good syntax.
I like the @plot begin .. end
syntax as well, although macros sometimes make me a bit nervous. As to your point about the plot syntax, it would also be possible to pass the plot object to the block rather than using push!:
plot() do p
layer(p, df_cos, x=:x, y=:y, color="blue", Geom.line, Geom.ribbon)
layer(p, df_sin, x=:x, y=:y, color="yellow", Geom.line, Geom.ribbon)
end
Perhaps there's a way to treat the function as a thunk to avoid having to pass an unnecessary parameter to the block, although I'd have to think about it a bit more.
The push!
approach feels better to me. One big advantage of not limiting pushability to a block scope is that you might want to add stuff after you evaluated the code that created the original plot. I'm not sure how that would work in IJulia, for example – would it update the original plot output or render the plot again with more stuff?
Let me elaborate a little on "feels better": push!
is simple and it's obvious what's happening. The reason for a block context like plot() do p ... end
provides would be if there's cleanup that needs to happen after all the incremental statements defining the plot have occurred. If that's not the case, then the block context is gratuitous. The reason for a macro would be to do something fancy that transforms the inner code, which seems really gratuitous – it's better to make the mechanics of what's happening obvious and slightly more verbose than to save a little typing at the cost of making it completely opaque what's happening. If you're modifying a plot object, then it should look like that's what's happening.
The macro seems like a pretty heavy weight solution to the problem of adding a ,
at the end of lines when using the plot
function.
One big advantage of not limiting pushability to a block scope is that you might want to add stuff after you evaluated the code that created the original plot.
I agree, but the more I actually think about it the more I think push!
may be unnecessary. Perhaps a plot should be considered immutable once created? Currently, I believe this is the case in Gadfly.
I vote for both :-).
The block scope allows to make the plot definition stand out, and it allows to call the rendering logic on finalization. It can be cleanly built it on top of the lower level push!()
API.
Passing a specialized layer
function is visually more pleasing to me:
plot() do layer
layer(df_cos, x=:x, y=:y, color="blue", Geom.line, Geom.ribbon)
layer(df_sin, x=:x, y=:y, color="yellow", Geom.line, Geom.ribbon)
SVG("myplot.svg", 6inch, 3inch)
end
# prototype. Note that I'm not familiar with Gadfly.
function plot(block::Function)
p = Plot()
layer(args...; kwargs...) = push!(p, args...; kwargs...)
target = block(layer)
if isa(target, RenderTarget) || isdefined(:Cairo) && isinteractive()
isa(target, RenderTarget) || (target = defaulttarget) # Cairo
draw(target, p)
end
p
end
EDIT: The block syntax is also convenient while tweaking a plot at the REPL. When navigating the history, you get the whole plot definition, you don't have to push!()
many times... You could use a begin ... end
block to that end, but I'm not sure that everyone will think about it. Providing the block API nudges users in the right direction.
Casting my vote for push!
. For completeness though, append!
could have some uses, especially when two complex plots need to be made that share a number of layers.
append!(p, default_layers)
append!(q, default_layers)
push!(p, another_layer)
push!(q, different_layer)
What about development of a chain operator and subsequently using that to add additional layers than nesting everything in a for loop.
check out hadley's successor to ggplot (ggvis) for how they're handling it. ggvis link
An example:
mtcars %>%
ggvis(~wt, ~mpg) %>%
layer_points() %>%
layer_model_predictions(model = "lm", se = TRUE)
and more in the ggvis cookbook
Generally the push, append notation has the benefit of beeing modifyable (especially in the REPL). But the argument of clearness doesn't count for me. It is clear for everyone, that in a huge function, the function is defined. Even for hundreds of lines (which is bad style, but still clear). So why would you rather say:
equation(strangesyntax:here, 1+1)
equation(strangesyntax:here, 1+1)
rather than:
I'm doing math now{
1+1
1+1
}
How simple the push command is, it will never be as readable as a block. And we are not talking about plot(sin,0,pi)
but rather sophisticated plots here. So repeatedly saying who you want to adress is exaclty what should be avoided!
For the above reasons, both syntaxes would be a perfect combo having best of both worlds. But for the sake of choice, a human readable block is to be prefered!
And adding a comma is definately a bad thing after each argument, if there may be enough! And everybody publishing any plots should have encountered the unreadability of these plotting scripts.
+1 push! I think it reads better and is more intuitive.
@neilpanchal Thanks for reminding about this. I just added a push!
function.
The idea has been floated (for example in https://github.com/JuliaLang/julia/issues/11030) to make ++
a generic concatenation operator in Julia. If that happens, which I think would be reasonable, we might automatically get syntax like
plot(x=rand(10), y=rand(10)) ++
Geom.line ++ Geom.point ++
layer(x=rand(50), y=rand(50), Geom.hexbin))
That may satisfy some of those who want an operator for building plots. Short of that, I'm not in favor of adding any special operators to Gadfly. Cryptic, special-case syntax can be useful, but there needs to a very compelling justification.
It's worth mentioning that chaining interfaces well with push!
using Gadfly
using Lazy
using DataFrames
up = DataFrame(x = [1, 2], y = [1, 2])
down = DataFrame(x = [1, 2], y = [2, 1])
@> begin
plot()
push!(@> up layer(x = :x,
y = :y,
Geom.line) )
push!(@> down layer(x = :x,
y = :y,
Geom.line) )
end
Another push! alternative would be to modify layer so that it is inherently iterative.
p = plot()
p_up = layer(p, up, x=:x, y=:y, Geom.line))
p_both = layer(p_up, down, x=:x, y=:y,Geom.line))
It looks like iterative building works for layers but not elements? Could it me possible to do something like this?
p_data = plot(up)
p_aes = element(p_data, x = :x, y = :y)
p_line = element(p_aes, Geom.line)
p_point = element(p_aes, Geom.point)
or, even better,
p_data = plot_data(up)
p_aes = aes(p, x = :x, y = :y)
p_line = geom_line(p_aes)
p_point = geom_point(p_aes)
This kind of syntax would also interface well with chaining.
@> begin
plot()
layer( @> begin
up
plot_data
aes(x = :x, y = :y)
geom_line
end )
layer( @> begin
down
plot_data
aes(x = :x, y = :y)
geom_line
end )
end
In fact, @dpastoor, this is exactly the framework used by ggvis
The general strategy would be to build a set of functions which take a plot as an argument and return an enhanced plot. And in this case, a plot is simply a set of instructions and not linked to actual graphics until printed.
Edit: Never mind, I think the strategy below works much better for this kind of thing.
using Lazy
using DataFrames
using Gadfly
using RDatasets
type Args
pos::Vector{Any}
key::Vector{Tuple}
end
function Args()
Args(convert(Vector{Any}, [] ),
convert(Vector{Tuple}, [] ) )
end
function add(args::Args, pos...; key...)
Args([args.pos, pos...], [args.key, key...])
end
function call(args::Args, fun)
fun(args.pos... ; args.key...)
end
@> begin
Args()
add(dataset("HistData", "ChestSizes"))
add(x = "Chest", y = "Count")
add(Geom.bar)
call(plot)
end
One of the frustrations I've always had with almost all plotting libraries is the verbose and obscure statements needed when producing rich plots, especially those with many annotations and layers.
Gadfly's syntax follows that of ggplot:
This format is good when dealing with very simple plots with only one layer, but it quickly gets out of hand when dealing with more complex plots. Shoving several arguments (almost all of which are optional keyword arguments) into a single function feels like an abuse of Julia's elegant syntax, especially when considering Julia isn't bound to the same syntactical constraints of R.
In my opinion, it's worth considering a more readable and declarative syntax by passing an anonymous function to plot() via a block.
For instance, consider the following example from the manual (http://dcjones.github.io/Gadfly.jl/geom_ribbon.html):
Perhaps a syntax like this would be a better substitute:
The benefit becomes more clear when adding content to the plot:
There are several other benefits:
Anyhow, I hope these points are helpful and I'm not coming off as being too critical. In my opinion, Gadfly is the best plotting tool for Julia. However, if the syntax of Gadfly leveraged Julia's expressive syntax it could become a more appealing alternative to ggplot, Matlab or any other plotting tool.
Thoughts?