amplab / training

Training materials for Strata, AMP Camp, etc
150 stars 121 forks source link

what is a reducer? #204

Closed sbenthall closed 8 years ago

sbenthall commented 8 years ago

In "Data Exploration..." there is this paragraph:

"Next, we shuffle the data and group all values of the same key together. Finally we sum up the values for each key. There is a convenient method called reduceByKey in Spark for exactly this pattern. Note that the second argument to reduceByKey determines the number of reducers to use. By default, Spark assumes that the reduce function is commutative and associative and applies combiners on the mapper side. Since we know there is a very limited number of keys in this case (because there are only 3 unique dates in our data set), let’s use only one reducer."

I'm a newbie and don't understand this paragraph. What is a reducer?