JuliaData / IndexedTables.jl

Flexible tables with ordered indices
https://juliadb.org
MIT License
120 stars 38 forks source link

select Tuple{Selection} does not seem to work as per docs #279

Closed akdor1154 closed 3 years ago

akdor1154 commented 3 years ago

Trying to use select(table, (:Name => ::Selection,)) does not seem to work as per my interpretation of the docs. From the documentation:

Selection is a type union of many types that can select from a table. It can be:

  1. Integer – returns the column at this position.
  2. Symbol – returns the column with this name.
  3. Pair{Selection => Function} – selects and maps a function over the selection, returns the result.
  4. AbstractArray – returns the array itself. This must be the same length as the table.
  5. Tuple of Selection – returns a table containing a column for every selector in the tuple. The tuple may also contain the type Pair{Symbol, Selection}, which the selection a name. The most useful form of this when introducing a new column.
  6. Regex – returns the columns with names that match the regular expression.
  7. Type – returns columns with elements of the given type.

As per this, I would expect to be able to select columns under a new name using select. E.g. If I call select(table, (:Name => :OldName,)), as this satisfies the type Tuple{Pair{Symbol, Selection}}. Is this meant to work?

joshday commented 3 years ago

It looks like nested Pairs are required to hit the correct method. Using Pair{Symbol, AnythingThatIsNotPair} attempts to call the second argument (selection type 3). There's a missing method somewhere.

Here's a workaround:

select(table, (:Name => :OldName => identity,))
akdor1154 commented 3 years ago

I'm using the following overload as a workaround:

function IndexedTables.select(table::IndexedTable, selection::Tuple{Vararg{Pair{Symbol, <:Any}}})
    names = [fst for (fst, snd) in selection] |> Base.splat(tuple)
    cols = [snd for (fst, snd) in selection] |> Base.splat(tuple)

    selected = IndexedTables.select(table, cols)
    renameList = (zip(colnames(selected), names) .|> Base.splat(Pair))
    return IndexedTables.rename(selected, renameList...)
end

If you can point me towards where the missing method should be, I'd have a play with a PR to fix this. I found it difficult to work out where it should go though - e.g. getindex on a ColDict?