MakieOrg / AlgebraOfGraphics.jl

Combine ingredients for a plot
https://aog.makie.org
MIT License
407 stars 43 forks source link

Name clash with `data` #346

Open KronosTheLate opened 2 years ago

KronosTheLate commented 2 years ago

AlgebraOfGraphics.jl exports data. But it is one of the most common variable names, leading to frequent name clashes. I do not see what the best solution is, but as I find this troublesome, I wanted to raise the issue.

piever commented 2 years ago

I was convinced there was already an issue for this but could not find it. It could make sense to also rename data as part of the breaking changes discussed at #345. Maybe with(df) could make sense? It is analogous to DataFramesMeta.@with, and with is probably less likely to be a variable.

KronosTheLate commented 2 years ago

Agreed. I actually think that with is a pretty great alternative.

I can also think of the very explicit, but a little long, 'with_data' or 'withdata'. I am not sure it is better than 'with', but wanted to throw it out there.

knuesel commented 2 years ago

Yes this was reported before in https://github.com/JuliaPlots/AlgebraOfGraphics.jl/issues/207

I think table(...) would fit well, but it's more likely than with to be a variable. On the other hand I would not be surprised if a with function is defined by packages to be used with the do block syntax.

knuesel commented 2 years ago

Another option: since data is rather redundant (df is of course the data) it could be left out entirely:

plt = df * mapping(:a, :b) * visual(Lines)

So AoG would define Base.:*(df, l::Layer) = data(df) * l, etc. and data would be unexported.

piever commented 2 years ago

I confess I would be a little hesitant to overload that method, but I was wondering whether another option could be to make Layer callable, so that one would do eg

df |> mapping(:a, :b) * lines |> draw
storopoli commented 2 years ago

I confess I would be a little hesitant to overload that method, but I was wondering whether another option could be to make Layer callable, so that one would do eg

df |> mapping(:a, :b) * lines |> draw

This is powerful, expressive and concise! I also prefer this API. Would this work in every DataFrame or just the ones named exclusively df?

jkrumbiegel commented 2 years ago

I still have to read through https://github.com/JuliaPlots/AlgebraOfGraphics.jl/discussions/345 but to me, it's confusing that you can draw a plot without data (pregrouped) by supplying the data directly to the mapping.

Therefore, I'd think either that functionality should move out of mapping for example by doing data(pregrouped_vectors) * mapping(1, 2) for specifying the vectors via indices. Or the data functionality could move inside of mapping because a mapping only really works by referencing a data source right? So to me it feels natural to pull the two together.

mapping(:x, :y, data = df) * visual(...)
knuesel commented 2 years ago

@piever for the overloading of * do you mean because it feels weird? (From a type piracy point of view it's legal as long as AoG owns one of the * argument types...)

Anyway I love the pipe version! I was first hesitant because currently the datasets are regular players in the algebra, for example we can do

(data(df1)*mapping(:a, :b) + data(df2)*mapping(:x, :y)) * visual(markersize=10)

but actually it also works with pipes:

((df1 |> mapping(:a, :b)) + (df2 |> mapping(:x, :y))) * visual(markersize=10)

(This takes a few more parentheses but for such a rare use case it's no big deal.)

I think the semantics of piping are a great fit here, and callable layers make sense even without pipes:

line_plot = mapping(:a, :b) * visual(Lines, linewidth=3)
plt = line_plot(df)  # Can be called on different datasets

If the pipe solution is adopted, it might be nice to add a "curried" version of draw to allow things like

df |> mapping(:a, :b) * lines |> draw(axis=(300, 200))

@storopoli the variable name doesn't matter (it's only for macros that the variable name can make a difference. Edit: and some odd cases like f(; a) and (; a) = x).

piever commented 2 years ago

for the overloading of * do you mean because it feels weird? (From a type piracy point of view it's legal as long as AoG owns one of the * argument types...)

It is legal, but I suspect it is at least discouraged, in that it is a bit "punny" and bound to create method errors.

I was first hesitant because currently the datasets are regular players in the algebra

Yes, that's the reason why it was not done via piping. But, as you point out, it's still nice to work with different datasets, eg

(mapping(:a, :b)(df1) + mapping(:x, :y)(df2)) * visual(markersize=10)

And in general, I think layer1(df1) + layer2(df2) reads reasonably well.

I wonder whether it could give rise to confusion if users wrote mapping(:a) * mapping(:b)(df) and thought df would only apply to mapping(:b). Maybe it's enough to document this and error if users try passing several datasets in the same layer, eg mapping(:a)(df1) * mapping(:b)(df2). These are all edgecases anyway, as I imagine the most common would be df |> layers |> draw.

I still have to read through https://github.com/JuliaPlots/AlgebraOfGraphics.jl/discussions/345 but to me, it's confusing that you can draw a plot without data (pregrouped) by supplying the data directly to the mapping.

Doing pregrouped data from a dataframe will definitely be supported (whereas it isn't right now). There is some discussion in #345 (we should probably continue this particular discussion there) as to whether plotting from vectors, eg mapping(rand(10), rand(10)) |> draw, should we supported with mapping or with another name, but I think we should allow it for users who do not generally work with tables.