Closed bkamins closed 4 years ago
They should definitely not be moved.
We all know that it is best practices not to refer to columns by their column number, but we also know that introductory users routinely use df[:, 5]
when referring to a specific column.
Not re-organizing would probably cause bugs that are hard to track down, especially because they would be encountered by less experienced users.
Plus Stata does not re-order and neither does R
r$> df = tibble(x = runif(5), g = c(1, 1, 1, 2, 2))
r$> t = df %>%
group_by(g) %>%
mutate(y = x - mean(x)) %>%
ungroup()
r$> t
# A tibble: 5 x 3
x g y
<dbl> <dbl> <dbl>
1 0.455 1 -0.0984
2 0.505 1 -0.0485
3 0.701 1 0.147
4 0.0432 2 -0.457
5 0.958 2 0.457
Thank you for a quick response. I classify it a bug as it also is related with e.g.:
julia> df = DataFrame(y = 1:4, x = ["b", "a", "b", "a"])
4×2 DataFrame
│ Row │ y │ x │
│ │ Int64 │ String │
├─────┼───────┼────────┤
│ 1 │ 1 │ b │
│ 2 │ 2 │ a │
│ 3 │ 3 │ b │
│ 4 │ 4 │ a │
julia> select(groupby(df, :x), ungroup=false)
GroupedDataFrame with 2 groups based on key: x
First Group (2 rows): x = "b"
│ Row │ x │
│ │ String │
├─────┼────────┤
│ 1 │ b │
│ 2 │ b │
⋮
Last Group (2 rows): x = "a"
│ Row │ x │
│ │ String │
├─────┼────────┤
│ 1 │ a │
│ 2 │ a │
julia> select!(groupby(df, :x), ungroup=false)
GroupedDataFrame with 2 groups based on key: Error showing value of type GroupedDataFrame{DataFrame}:
ERROR: BoundsError: attempt to access 1-element Array{Symbol,1} at index [[2]]
I will submit a patch soon.
Unfortunately this is linked with https://github.com/JuliaData/DataFrames.jl/issues/2297, so it will take a bit more to fix and will require a minor release.
In 0.21 we have:
which means that grouping column goes first. The question is if we want to keep this behavior or insist that the original columns of
df
stay in their order in the parent ofGroupedDataFrame
?CC @pdeffebach @matthieugomez