Closed bkamins closed 2 years ago
One solution would be to ignore the type of fill
when choosing the element type of the column to allocate. But that would be problematic in particular for missing
, but also e.g. for fill=1.5
if original columns are integers (less likely). We could special-case missing
but that's not ideal.
Is there any reason to think that people may do this kind of thing? Even without CategoricalArray
, you'd get a column with element type Any
, which is usually not what one wants.
Maybe there is no good solution to this and we should add a comment to a docstring what do to in case of categorical columns as a special case then? (as you have to pass CategoricalValue
to keep column categorical) Also note that:
julia> df = DataFrame(row=[1,1,2,2], col=["a","b","a","b"],val=categorical('a':'d'))
4×3 DataFrame
Row │ row col val
│ Int64 String Cat…
─────┼─────────────────────
1 │ 1 a a
2 │ 1 b b
3 │ 2 a c
4 │ 2 b d
julia> unstack(df,:row,:col,:val, fill='e')
2×3 DataFrame
Row │ row a b
│ Int64 Char Char
─────┼───────────────────
1 │ 1 a b
2 │ 2 c d
julia> unstack(df,:row,:col,:val, fill='e').a
2-element Vector{Char}:
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
so if you pass fill
of the correct base type things get unwrapped.
This can be closed, right?
OK
@nalimilan - do you have an idea how we could fix this:
or maybe we decide that this is the intended behavior?
This decision also affects https://github.com/JuliaData/DataFrames.jl/pull/3012 where we have the same issue.