JuliaData / SplitApplyCombine.jl

Split-apply-combine strategies for Julia
Other
144 stars 15 forks source link

`group` no longer plays nice with TypedTables #31

Closed fredcallaway closed 3 years ago

fredcallaway commented 3 years ago

The TypedTables docs say that SplitApplyCombine.group will turn results into tables, as in this example

julia> groups = group(getproperty(:lastname), t)
Groups{String,Any,Table{NamedTuple{(:firstname, :lastname, :age),Tuple{String,String,Int64}},1,NamedTuple{(:firstname, :lastname, :age),Tuple{Array{String,1},Array{String,1},Array{Int64,1}}}},Dict{String,Array{Int64,1}}} with 4 entries:
  "King"     => Table with 3 columns and 1 row:…
  "Williams" => Table with 3 columns and 2 rows:…
  "Brown"    => Table with 3 columns and 1 row:…
  "Smith"    => Table with 3 columns and 3 rows:…

However, in v1.1.3, we now get an Array{NamedTuple}. Was this change intentional? If so, I will make an issue at ~SplitApplyCombine~ TypedTables asking them to update their docs (I see that they have Dict rather than Dictionary so perhaps I should raise an issue regardless)

andyferris commented 3 years ago

Thanks. We did some work to make group faster in certain cases that might have broken this... so yeah it would be nice to fix. I suspect SplitApplyCombine is no longer using similar or empty or whatever to construct a generic container, and is instead always creating a Vector.

fredcallaway commented 3 years ago

Indeed, my example ends up here where the output is forced to be a Vector.

I started on a pull request but got stuck when I found that you can't initialize an empty TypedTables.Table. I see two possible solutions:

  1. make an issue at TypedTables.jl asking for an empty table initializer.
  2. map the type of of values over out before returning.

I guess that (2) is not desirable for efficiency. Does (1) make sense to you?

In case it's useful, here's as far as I got

function group(groups::AbstractArray, values::V) where V <: AbstractArray
    I = eltype(groups) # TODO EltypeUnknown
    T = eltype(values) # TODO EltypeUnknown

    if keys(groups) != keys(values)
        throw(DimensionMismatch("dimensions must match"))
    end

    out = Dictionary{I, V}()  # Vector{T} -> V
    @inbounds for i in keys(groups)
        group = groups[i]
        value = values[i]
        push!(get!(V, out, group), value)  # Vector{T} -> V
    end

    return out
end

I think this method should return an object of type Dictionary{I, V}.

andyferris commented 3 years ago

@fredcallaway I just tried to register new versions of SplitApplyCombine (1.1.4) and Typed Tables (1.2.1) - together these should fix the issue.

I still need to update the docs to reflect the change from Base.Dict to Dictionaries.Dictionary...

https://github.com/JuliaRegistries/General/pull/28756 https://github.com/JuliaRegistries/General/pull/28757

EDIT: These have now both merged, so ]pkg up should resolve the problem.

Thanks again for the bug report!

andyferris commented 3 years ago

I started on a pull request but got stuck when I found that you can't initialize an empty TypedTables.Table

The trick I used here was to fall back on Base.emptymutable which should give an AbstractVector that you can push! to. (Unfortunately the implementation now relies a bit more on inference, but at least it is functioning).