@groupby on a function of a field

zygmuntszpak commented 6 years ago

I would like to be able to group on the result of applying a function to one of the columns. For example, suppose that I have a column :DATETIME which stores the year/month/day/h/m/s. In some queries I might want to group on the DATE only, whereas in other queries I might want to group on the TIME.

Hence, it would like to write something like this:

@apply t begin
    @groupby Date.Date.(:DATETIME) {length = length(_)}
end

Is this type of operation currently supported, but I am just using the wrong syntax? As a workaround I could always add more columns using @transform to explicitly split the DATETIME into DATE and TIME, but I was wondering if there is another solution.

piever commented 6 years ago

I think that I should simply allow the syntax your using (actually, I think it should have to be Date.Date(:DATETIME) without the dot as it is a element-wise operation). JuliaDB supports using a selection in a groupby function and you can use the @=> macro to get the selection in JuliaDBMeta more easily:

julia> using JuliaDBMeta

julia> iris = loadtable(Pkg.dir("JuliaDBMeta", "test", "tables", "iris.csv"));

help?> @=>
  @=>(expr...)

  Create a selector based on expressions expr. Symbols are used to select columns and infer an
  appropriate anonymous function. In this context, _ refers to the whole row. To use actual symbols,
  escape them with ^, as in ^(:a). Use cols(c) to refer to field c where c is a variable that
  evaluates to a symbol. c must be available in the scope where the macro is called.

     Examples
    ==========

  julia> t = table(@NT(a = [1,2,3], b = [4,5,6]));

  julia> select(t, @=>(:a, :a + :b))
  Table with 3 rows, 2 columns:
  a  a + b
  ────────
  1  5
  2  7
  3  9

julia> select(iris, @=>(:Species == "setosa"))
150-element Array{Bool,1}:
  true
  true
  true
  true
  true
  true
  true
  true
  true
  true
     ⋮
 false
 false
 false
 false
 false
 false
 false
 false
 false

julia> @groupby iris @=>(:Species=="setosa") {length = length(_)}
Table with 2 rows, 2 columns:
Species == "setosa"  length
───────────────────────────
false                100
true                 50

Note that this @=> macro is not specific to JuliaBDMeta function but you can use it with normal JuliaDB:

julia> groupby(length, iris, @=>(:Species=="setosa"))
Table with 2 rows, 2 columns:
Species == "setosa"  length
───────────────────────────
false                100
true                 50

zygmuntszpak commented 6 years ago

Thank you very much for the clarification, and for this great package. For the cursory reader, the following is a solution to my example.

@apply r begin
    @groupby @=>(Dates.Date(:DATETIME)) {length = length(_)}
end

JuliaData / JuliaDBMeta.jl

@groupby on a function of a field #27