Closed korenmiklos closed 2 days ago
collapse is fast. collapsing 10m rows into 10k groups takes 1.4 seconds:
collapse
julia> @time @collapse df mean_x = mean(x), by(y) 1.352974 seconds (5.04 M allocations: 648.601 MiB, 8.88% gc time, 78.05% compilation time: 45% of which was recompilation) 10001×2 DataFrame Row │ y mean_x │ Int64 Float64 ───────┼───────────────────── 1 │ 0 500.0
By contrast, egen takes 81:
egen
julia> @time @egen df mean_x = mean(x), by(y) 80.653723 seconds (1.40 M allocations: 839.755 GiB, 12.42% gc time, 0.38% compilation time: 17% of which was recompilation) 10000000×3 DataFrame Row │ y x mean_x │ Int64 Int64 Float64? ──────────┼───────────────────────────── 1 │ 0 1 500.0
collapse
is fast. collapsing 10m rows into 10k groups takes 1.4 seconds:By contrast,
egen
takes 81: