from mrjob.job import MRJob
from mrjob.step import MRStep
import re
import numpy as np
WORD_RE = re.compile(r"[\w']+")
class WordCount(MRJob):
def mapper(self, _, line):
for word in WORD_RE.findall(line):
yield (word.lower(), 1)
def combiner(self, word, counts):
yield (word, sum(counts))
def reducer(self, word, counts):
yield (word, sum(counts))
Here I use a very simple wordcount mapreduce job to test on hadoop mode, But I got an error:
Unrecognized option: -D
Try -help for more information
Streaming Command Failed!
Usage: $HADOOP_HOME/bin/hadoop jar hadoop-streaming.jar [options]
Here I use a very simple wordcount mapreduce job to test on hadoop mode, But I got an error:
I ran this job on Windows10.