Yelp / mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services
http://packages.python.org/mrjob/
Other
2.62k stars 587 forks source link

what values to yield if I want to delete some rows? #2085

Open xyl576807077 opened 5 years ago

xyl576807077 commented 5 years ago

Hello, I want to delete some rows according to some condition. And yield None, None. It occured error

Traceback (most recent call last):
  File "task.py", line 39, in <module>
    MR_First.run()
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/job.py", line 446, in run
    mr_job.execute()
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/job.py", line 473, in execute
    super(MRJob, self).execute()
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/launch.py", line 202, in execute
    self.run_job()
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/launch.py", line 247, in run_job
    runner.run()
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/runner.py", line 508, in run
    self._run()
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/sim.py", line 159, in _run
    self._run_step(step, step_num)
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/sim.py", line 168, in _run_step
    self._run_streaming_step(step, step_num)
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/sim.py", line 185, in _run_streaming_step
    self._run_reducers(step_num, num_reducer_tasks)
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/sim.py", line 289, in _run_reducers
    for task_num in range(num_reducer_tasks)
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/sim.py", line 128, in _run_multiple
    func()
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/sim.py", line 741, in _run_task
    stdin, stdout, stderr, wd, env)
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/inline.py", line 132, in invoke_task
    task.execute()
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/job.py", line 467, in execute
    self.run_reducer(self.options.step_num)
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/job.py", line 572, in run_reducer
    for k, v in self.reduce_pairs(read_lines(), step_num=step_num):
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/job.py", line 643, in reduce_pairs
    for k, v in self._combine_or_reduce_pairs(pairs, 'reducer', step_num):
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/job.py", line 664, in _combine_or_reduce_pairs
    for key, pairs_for_key in itertools.groupby(pairs, lambda k_v: k_v[0]):
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/job.py", line 767, in read_lines
    key, value = read(line.rstrip(b'\r\n'))
  File "/home/yulunxiao/miniconda2/lib/python2.7/site-packages/mrjob/protocol.py", line 90, in read
    raw_key, raw_value = line.split(b'\t', 1)
ValueError: need more than 1 value to unpack
coyotemarin commented 5 years ago

Could you show me your job?

xyl576807077 commented 5 years ago

Could you show me your job?

For example, I have a csv file like this

1 3 54
2 0 59
3 18 23
4 8 12

I want to delete those rows whose second column value is less than 7

So If i write the below code, is it correct?

def mapper(self, _, line):
    line = line.split(' ')
    if line[1] < 7:
        yield None, None
    else:
        yield line[0], line[1:]
xyl576807077 commented 5 years ago

Hope your response