So far the plyrmr approach has been the same as rmr2's, which is no implicit DFS to master RAM transfers and back. But what if the result of an operation is guaranteed to fit main memory? Take gather: groups everything into one, hence it must fit in memory for the last reduce call. NO! gather could be composed with another grouping operation, hence there could be multiple reduce calls hence the output could not fit in RAM. Score one for orthogonality! Anyway, let's document here this possibility has been considered but so far rejected. Unlike sparkR's Reduce, which will zap any existing grouping, gather plays nicely with existing grouping (that allows to write operation like quantile that work for group and ungrouped data and use gather; the alternative could be, as it was in the past, that operations check for the grouping state explicitly before applying a gather, but that's boilerplate).
So far the plyrmr approach has been the same as rmr2's, which is no implicit DFS to master RAM transfers and back. But what if the result of an operation is guaranteed to fit main memory? Take
gather
: groups everything into one, hence it must fit in memory for the last reduce call. NO!gather
could be composed with another grouping operation, hence there could be multiple reduce calls hence the output could not fit in RAM. Score one for orthogonality! Anyway, let's document here this possibility has been considered but so far rejected. Unlike sparkR's Reduce, which will zap any existing grouping,gather
plays nicely with existing grouping (that allows to write operation like quantile that work for group and ungrouped data and usegather
; the alternative could be, as it was in the past, that operations check for the grouping state explicitly before applying agather
, but that's boilerplate).