TidierOrg / TidierPlots.jl

Tidier data visualization in Julia, modeled after the ggplot2 R package.
MIT License
196 stars 7 forks source link

should `c()` be deprecated? #90

Closed adknudson closed 2 months ago

adknudson commented 2 months ago

I understand that the point of Tidier.jl is to match the tidyverse in Julia, but should c() really be a method that is provided? It's not necessarily a tidyverse-specific method, and it feels superfluous when a user can just create a vector or tuple, or use one of the other methods like collect, cat, and vcat. At worst it feels anti-Julian, and at best it should be a common method provided by a TidierBase package and re-exported by the Tidier packages.

kdpsingh commented 2 months ago

I intentionally don't provide a c() method in TidierData and agree that either Julia vectors or tuples are preferred.

kdpsingh commented 2 months ago

I haven't looked too closely yet, but if we use c() in arguments that are immutable, then my suggestion would be to move to tuples () rather than arrays because tuples are stack-allocated and thus faster.

rdboyes commented 2 months ago

I do provide a c() method in TidierPlots, because I essentially want people to be able to copy and paste R ggplot code and have it work (after changing aes to @aes, but there's no way around that). It's a pretty easy way to adhere to the stated goal 1 of the package, so I feel pretty strongly that it should be left un-deprecated.

There's nothing stopping anyone from using a vector or tuple instead though if they want, since c() essentially just creates a vector of its arguments!

rdboyes commented 2 months ago

I would be fine with changing c() to return a tuple, but I can't imagine the speedup is detectable for the applications it's used for (e.g. scale_color_manual(values = c("red", "blue", "white")))

kdpsingh commented 2 months ago

It wouldn't be for speed so much as to make it closer to (Julia) style, since c() is more of an R thing rather than a tidyverse thing.

kdpsingh commented 2 months ago

As long as vectors/tuples work and are documented, I think that's fine (if you prefer to leave c() as an option).

But in TidierData, I just use tuples because visually they look like R minus the c letter.

adknudson commented 2 months ago

It doesn't seem like c() is used anywhere in the source code, just in docs and exported as a convenience method. I make the argument that providing this convenience method can cause more issues for end users than it solves. If only TidierPlots has c(), then what happens when users try other TidierOrg packages and get confused when c() isn't found? Either all Tidier packages must export c() or none of them should.

It's a pretty easy way to adhere to the stated goal 1 of the package, so I feel pretty strongly that it should be left un-deprecated.

I don't think keeping c() really aids in goal 1 too much. Users coming from R to Julia will already be needing to adjust to Julia syntax. Goal 1 feels more about keeping ggplot2 syntax and tidyverse verbs than keeping R syntax.

As a compromise, if c() is kept around then it should give a warning along the lines of:

"c() is an R-specific method. The Julia equivalent is to create a vector ([1, 2, 3]) or to use vcat to join multiple vectors (vcat([1,2,3], [4,5,6])), or join multiple vectors using a semicolon ([1:3; 4:6]). "

Lastly if c() is to be kept around, it should behave just like it does in R

The current implementation of c() is incorrect.

In R:

> c(1, "2", 3, c(4, 5, 6))
[1] "1" "2" "3" "4" "5" "6"

In Julia:

julia> c(1, "2", 3, c(4, 5, 6))
4-element Vector{Any}:
 1
  "2"
 3
  [4, 5, 6]
rdboyes commented 2 months ago

I didn't mean for goal one to only apply to verbs or to specific parts of the code - I really meant what it says: "stick as closely to tidyverse syntax and behaviour as possible." To me, that means - especially if it is easy to implement, as it is in this case - that ggplot code written in R should run in TidierPlots.

I'm on board with adding the note about julia vector creation though, that seems in line with the goal of introducing R users to julia.

We can edit c to support those use cases, but they are unlikely to come up in correctly specified ggplot code so I haven't made it a priority

adknudson commented 2 months ago

The c() function can be simply changed to

function c(args...)
    @warn "don't use 'c(1, 2, 3)', use [1, 2, 3]" maxlog=2
    return reduce(vcat, args)
end

which will at least solve the second issue of correctness

rdboyes commented 2 months ago

c has been updated - I will leave the question of whether c should or should not be supported outside of TidierPlots as well to @kdpsingh

kdpsingh commented 2 months ago

Thanks -- I'll think more on this but my inclination for now is to leave things as-is. I'm open to feedback though.

I do want to avoid a TidierBase package for now because ideally each package in the Tidier family should be useful as a standalone package (unless having TidierBase helps reduce re-compilation of re-exported macros like @chain).