TidierOrg / Tidier.jl

Meta-package for data analysis in Julia, modeled after the R tidyverse.
MIT License
524 stars 14 forks source link

Add `@bind_rows` and `@bind_cols` #65

Closed zhezhaozz closed 1 year ago

zhezhaozz commented 1 year ago

Supporting following features

using .Tidier
df1 = DataFrame(A=1:3, B=1:3);
df2 = DataFrame(A=4:6, B=4:6);
df3 = DataFrame(A=7:9, C=7:9);

@chain df1 begin
  @bind_rows(df2, df3)
end

@chain df1 begin
  @bind_rows(df2, df3, id = :id)
end

@chain df1 begin
  @bind_rows(df2, df3, id = "id")
end

@chain df1 begin
  @bind_rows(df2, df3, id = :source => 'a':'c')
end

@chain df1 begin
  @bind_cols(df2, df3)
end
kdpsingh commented 1 year ago

Once you get this working and add docstrings, I would also recommend adding something to the bottom of the "joins" documentation page to show this capability.

I like supporting id = "id" if you've already built support for it but it's not 100% required in our initial version if you haven't yet built it.

I wouldn't show id = :id in the docstrings or docs because I don't want to use symbols in our functions (for consistency within Tidier.jl).

Thanks Zhe for your work on this!

zhezhaozz commented 1 year ago

@kdpsingh id = "id" is supported in @bind_rows now. For id = :id and id = :source => 'a':'c', do you want me to deprecate them or simply not show them in docs?

kdpsingh commented 1 year ago

If you've already built them, I would just leave this functionality in a code comment (for internal awareness) but leave it out of docstrings and docs themselves.

I just want to make sure the API is consistent across the board, which is why I'd like to avoid the use of symbols here.

Also, I don't think I've ever used the id argument in either of these functions. Not saying that no one would ever need this but I think it's okay for us to focus on the simpler use cases.

So let's start with hiding the advanced functionality from docs for now. If we can come up with a way to implement it without symbols, we'll add it back to the docs later in a future PR.

zhezhaozz commented 1 year ago

I just want to make sure the API is consistent across the board, which is why I'd like to avoid the use of symbols here.

Also, I don't think I've ever used the id argument in either of these functions. Not saying that no one would ever need this but I think it's okay for us to focus on the simpler use cases.

Sounds good to me! Agree with the philosophy here.

I also can't remember using id, but just want to make the future implementation easier in case anyone need it.

zhezhaozz commented 1 year ago

Not sure about how to add documentation page. Followed step by step on tutorial, but it just throws me an error: LoadError: UndefVarError: @bind_rows not defined. Does it mean we need to merge and register the Tidier.jl before adding new documentation?

kdpsingh commented 1 year ago

Make sure to add the new macros to the list of exported functions at the top of the Tidier.jl file (the main package file).

zhezhaozz commented 1 year ago

Make sure to add the new macros to the list of exported functions at the top of the Tidier.jl file (the main package file).

I already added them and the error persists.

Does it mean we need to merge and register the Tidier.jl before adding new documentation?

The reason I think this is the problem is that I also have this error about backtick notation, which you just implemented yesterday:

Error: failed to run `@example` block in src/examples/generated/UserGuide/column_names.md:32-36
│ ```@example column_names
│ @chain df begin
│   @mutate(`age in 10 years` = `my age` + 10)
│ end
│ ```
│   value =
│    LoadError: MethodError: no method matching parse_function(::Expr, ::Expr; autovec=true, subset=false)
│    Closest candidates are:
│      parse_function(::Symbol, ::Expr; autovec, subset) at ~/.julia/packages/Tidier/GVzRd/src/parsing.jl:87
│    in expression starting at column_names.md:34
└ @ Documenter.Expanders ~/.julia/packages/Documenter/H5y27/src/Utilities/Utilities.jl:32

Other macros' examples work well, except @bind_rows, @bind_cols and parse_function() for backticks.

kdpsingh commented 1 year ago

This was passing tests in your earlier commit to this PR so I think something may have gotten messed up on your end. Happy to take a look if you push your latest commit.