Performance of the Monoid instance

I did a bit of benchmarking. The test is to load a frame with N rows, make a list with M copies, make that into one frame via mconcat or toFrame . fmap toList and then do a fold to sum over one column (insuring the final frame is actually used), then test for various N and M. TLDR: Things scale linearly in frame size but mconcat scales almost quadratically in number of frame copies whereas toFrame remains linear. The monoid version is quite a bit faster for small numbers of copies--I'd guess because there is no copying of data?--and the crossover point in my experiment was combining about 500 frames. so the ideal function might be something like

frameConcat :: (RecVec rs, Foldable f, Functor f) => f (FrameRec rs) -> FrameRec rs
frameConcat x = if length x < 500
                then mconcat $ toList x
                else Frames.toFrame . concat $ fmap toList x

Combining so many frames might not seem like a common use case but for me it has been, resulting from map/reduce type operations where the input is a frame and I then process each of many subsets (census tracts, say) in some way which results in a new frame which I then want to recombine into one large frame. It's possible that the right thing is to accumulate the intermediate results as lists of records rather than frames. But I still think Frames should support the large-scale concat.

I'd be happy to submit a PR (that function plus some instructive documentation!) if it would be accepted. If you would accept it, what module would you like that to go in? Maybe InCore?

acowley / Frames

Performance of the Monoid instance #187