JuliaData / DataFrames.jl

In-memory tabular data in Julia
https://dataframes.juliadata.org/stable/
Other
1.73k stars 367 forks source link

filter performance #3460

Closed sprig closed 1 month ago

sprig commented 1 month ago

Hello,

Thanks for your work on this package!

I happened to come upon your comment in the julia forums regarding performance of filter on whole dataframes vs column selectors; https://discourse.julialang.org/t/adding-multiple-new-columns-to-dataframe/50997/14

Having read the documentation for filter both in installed versions and on the docs site - and now also the docstrings directly in the code https://github.com/JuliaData/DataFrames.jl/blob/52d5a62ce6c4742b50fcc66478f0883142acf295/src/abstractdataframe/abstractdataframe.jl#L1124 - I see no mention of this fact other than perhaps indirectly (i.e. conclude this from the fact that rows are passed to the function rather than columns).

I assume this is still correct due to e.g. type stability as well as the performance of unpacing rows vs accessing the column vectors directly? Would you confirm please? And, do you think it would be prudent to state this explicitly?

bkamins commented 1 month ago

Yes, the reason is that DataFrameRow is not type stable. This info can be added to a docstring. Would you be willing to make a PR (I can do it instead)

sprig commented 1 month ago

Sure!