TidierOrg / Tidier.jl

Meta-package for data analysis in Julia, modeled after the R tidyverse.
MIT License
515 stars 14 forks source link

Add `@glimpse` and `@relocate` #80

Closed zhezhaozz closed 1 year ago

zhezhaozz commented 1 year ago

These two macros are sort of nice-to-have features. Therefore I wish to add them.

Seems that DataFrames.jl does not have similar feature as tidyverse's glimpse(). The workaround I propose is using describe() to print summary information for each column and force the it to print all rows of the summary table in stdout. For example:

julia> df1 = DataFrame(a=1:3, b=1:3, c=4:6, d=4:6, e=7:9, f1=["7", "8","9"], f2=7:9);

julia> @glimpse(df1)
7×7 DataFrame
 Row │ variable  eltype    mean    min  median  max  nmissing 
     │ Symbol    DataType  Union…  Any  Union…  Any  Int64    
─────┼────────────────────────────────────────────────────────
   1 │ a         Int64     2.0     1    2.0     3           0
   2 │ b         Int64     2.0     1    2.0     3           0
   3 │ c         Int64     5.0     4    5.0     6           0
   4 │ d         Int64     5.0     4    5.0     6           0
   5 │ e         Int64     8.0     7    8.0     9           0
   6 │ f1        String            7            9           0
   7 │ f2        Int64     8.0     7    8.0     9           0

Right now @relocate support following use cases:

@relocate(df1, d)
@relocate(df1, d, f)
@relocate(df1, f1, before =a)
@relocate(df1, f, before =(b,c))
@relocate(df1, contains("f"), after = a:c)
@relocate(df1, f1, before =contains("b"))
zhezhaozz commented 1 year ago

I think in long-term, we need something similar to tidy-select modifiers to support features like where(is.character), selecting based on a vector, or selecting based on regex.

kdpsingh commented 1 year ago

Agree that I need to build support for across(where(...)).

I would like to do something more for @glimpse than to simply wrap describe() bc glimpse() in R shows you the first few values of a vector and gives you the type. This shouldn't be too hard to recreate. The result could either be a data frame or we could print to console using @info() or println().

So would either leave it out of this PR or make it function more like its dplyr counterpart.

kdpsingh commented 1 year ago

I think describe() lets you provide custom functions so we may even be able to use describe() to create the dplyr glimpse-like output (for example by using first()).

zhezhaozz commented 1 year ago

Will add both macros in different PRs