This is a problem with our mapreduce version of the submitter. The original mapred submitter is unaffected.
The minimal setup is a map-only, java reader & writer app:
import pydoop.mapreduce.api as api
import pydoop.mapreduce.pipes as pipes
class Mapper(api.Mapper):
def map(self, context):
context.emit(context.key, len(context.value))
def __main__():
pipes.run_task(pipes.Factory(mapper_class=Mapper))
Run this with only one mapper on a substantial amount of input (e.g., replicate examples/input/alice_1.txt 1000 times). Monitor the job on the console: with our mapreduce submitter, progrss will remain stuck at 0%, then jump to 100% right before the end of the job. With the mapred submitter, progress is gradually updated as expected.
This is a problem with our
mapreduce
version of the submitter. The originalmapred
submitter is unaffected.The minimal setup is a map-only, java reader & writer app:
Run this with only one mapper on a substantial amount of input (e.g., replicate
examples/input/alice_1.txt
1000 times). Monitor the job on the console: with ourmapreduce
submitter, progrss will remain stuck at 0%, then jump to 100% right before the end of the job. With themapred
submitter, progress is gradually updated as expected.Note that this was NOT fixed by https://github.com/crs4/pydoop/pull/322.