RevolutionAnalytics / plyrmr

38 stars 31 forks source link

implicit as.data.frame for guaranteed small result operations #38

Open piccolbo opened 10 years ago

piccolbo commented 10 years ago

So far the plyrmr approach has been the same as rmr2's, which is no implicit DFS to master RAM transfers and back. But what if the result of an operation is guaranteed to fit main memory? Take gather: groups everything into one, hence it must fit in memory for the last reduce call. NO! gather could be composed with another grouping operation, hence there could be multiple reduce calls hence the output could not fit in RAM. Score one for orthogonality! Anyway, let's document here this possibility has been considered but so far rejected. Unlike sparkR's Reduce, which will zap any existing grouping, gather plays nicely with existing grouping (that allows to write operation like quantile that work for group and ungrouped data and use gather; the alternative could be, as it was in the past, that operations check for the grouping state explicitly before applying a gather, but that's boilerplate).