Yelp / mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services
http://packages.python.org/mrjob/
Other
2.62k stars 586 forks source link

How to launch more than one reducer to execute a job? #2181

Open ParadoxZW opened 4 years ago

ParadoxZW commented 4 years ago

I wrote following code to do a words sort task

#!/usr/bin/python
# -*- coding: utf-8 -*-
from mrjob.job import MRJob
import re

class MRwordCount(MRJob):
    def mapper(self, in_key, in_value):
        bins = {chr(i):[] for i in range(97,123)}
        for word in in_value.split(' '):
            key_j = word[0]
            bins[key_j].append(word)
        for key_j, value_j in bins.items():
            yield (key_j, sorted(value_j))

    def reducer(self, key, value_list):
        words = []
        for value in value_list:
            words += value
        words = sorted(words)
        yield (key, words)

if __name__ == '__main__':
    MRwordCount.run()

when I run this demo in Hadoop, only 1 reducer was launched. I'd like to know how to launch more than one reducer in mrjob? I'm new in hadoop, so above question may be trivial. But I really appreciate if you can help me. Thx very much.