JuliaData / DataFrames.jl

In-memory tabular data in Julia
https://dataframes.juliadata.org/stable/
Other
1.73k stars 367 forks source link

vcat without copy #3336

Open xgdgsc opened 1 year ago

xgdgsc commented 1 year ago

Currently it' s possible with CatViews on columns and then do a DataFrame construction with copycols = false . Would it be better if it become a builtin?

bkamins commented 1 year ago

Typically we prefer to follow the "composability" pattern. This means that users wanting such a functionality are recommended to use the extra package to get it.

The additional point is that most of the time DataFrames.jl users expect that operations like push! would work on a data frame. If I understand how CatViews.jl works it would not be possible then. Right?

xgdgsc commented 1 year ago

What happens when the current copycols=false constructed DataFrame are push!ed to? Is there a copy without warning?

I just thought if construction by columns without copy is already supported, it might make sense to support construction by rows without copy too. Or move all of no copy construction to another package?

I don' t know if issue in lesser known packages like https://github.com/ahwillia/CatViews.jl/pull/23 would be more discoverable in either case. Composability is nice if user already know a lot of packages, https://github.com/mcabbott/LazyStack.jl has an interesting summary which seem hard to choose for a new beginner (I only discovered this page after fixing CatViews). Indeed it' s hard to decide the boundaries as the lengthy Fixing Package Fragmentation discussion shows.

bkamins commented 1 year ago

Is there a copy without warning?

No, then the source object is mutated. And in your case it cannot be mutated, so you would get an error.

it might make sense to support construction by rows without copy too.

Construction by rows must copy, as DataFrame uses column storage internally.

Indeed it' s hard to decide the boundaries

This can be hard indeed. In DataFrames.jl we use the rule (relevant to your case) that we do not introduce new column types, users can use column types they have from other packages; sadly - even this rule is violated by stack, but this is due to historical reasons when DataFrames.jl was a catch-all package)