Open tcovert opened 7 years ago
t0 = IndexedTable(Columns(id = [1, 2]), Columns(val = ["a,b", "c,d"]))
julia> t1 = mapslices(t0, ()) do slice; parts = split(first(slice).val, ",")
IndexedTable(fill(1, length(parts)),parts)
end
─────┬────
1 1 │ "a"
1 1 │ "b"
2 1 │ "c"
2 1 │ "d"
julia> select(t1, 1)
──┬────
1 │ "a"
1 │ "b"
2 │ "c"
2 │ "d"
It's required that an IndexedTable have at least 1 dimension, hence the extra dimension returned in mapslices
.
Thanks - more complicated than I would have thought.
Would it make sense to add another clause in the mapslices
code that just checks if the function returns Columns
instead of an IndexedTable
, similar to how it differentiates between IndexedTable and scalar return values? That would seem to be easier on the user and would not necessitate dropping an extra key
column after the fact.
It would be nice to be able to write something like this:
t1 = mapslices(x->Columns(newval = split(first(x).val, ",")), t0, ())
but that currently triggers the "calling mapslices with no dimensions and scalar return value -- use map instead" error, and the equivalent map
statement is only slightly better:
julia> map(x->Columns(newval = split(x.val, ",")), t0)
id │
───┼──────────────────────────────────────────────────────────────────────────
1 │ NamedTuples._NT_newval{SubString{String}}[(newval = "a"), (newval = "b")]
2 │ NamedTuples._NT_newval{SubString{String}}[(newval = "c"), (newval = "d")]
Here is the Query.jl way to do this:
@from i in t0 begin
@select {i.id, names=split(i.val,",")} into i
@from j in i.names
@select {i.id,newval=j}
@collect IndexedTable
end
@tcovert I agree it's harder than it should be... It's easy to add a function that does this, it would fill a new dimension from 1:length of vector
for every row. What would one call it? This is something like reduce(hcat, vector of vectors)
rather than flatten
.
@davidanthoff that looks pretty neat! what does the second select
do?
This query is actually two queries chained, i.e. a @select ... into i
concludes the first query, and then right away starts a new query on those results with i
as the range variable. That second query again needs to terminate with a @select
statement, so that is what the second @select
does.
The really neat thing is the @from
in the middle. It is actually flattening the list of lists that the first query creates.
In my mind, this could just be a part of map
. In the event that f
applied to an element of a IndexedTabe
evaluates to a Columns()
, map would construct a new IndexedTable with those columns as the data and the original index columns as the indices, though I agree that this violates the typical definition of map
.
What if map
returned an array of IndexTables
in this case and the user could just use reduce(cat, map(...))
?
Another approach here would that respects the IndexedTable
goal of having a single row per key would be to add the newly created column to the set of key columns.
I think this is fundamentally a transformation that corresponds to the SelectMany query operation in LINQ, which the second @from
clause in the Query.jl example is. I don't think this fits the semantics of map
, which seems to have well defined semantics that don't really fit this case.
What is a way of reshaping an
IndexedTable
? The common use case I have in mind is where a dataColumn
contains a vector of strings, where the typical entry is of the form "a,b" and the goal is to separate the two string values into a longer column of individual string values. Here is a MWE:Is there a way to get
t1
fromt0
using the functionsplit
? It seems this ought to be somehow possible withmap
, but I believe thef
inmap(f, t::IndexedTable)
is expected to return a scalar or aNamedTuple
, not aColumns
object. Doesmapslices
do this?The reverse of this operation seems to be do-able using
aggregate
.Thanks in advance for any suggestions!