Yelp / mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services
http://packages.python.org/mrjob/
Other
2.62k stars 586 forks source link

Write counters to local filepath when spark-tmp-dir is not an S3 path. #2177

Closed 88manpreet closed 4 years ago

88manpreet commented 4 years ago

Issue: #2176 The proposed fix is to write to given local-file path using python file writer if the part-* (counters file) is not created on driver.

Additional (possible?) benefit of the fix is to avoid additional communication overhead (latency and network access) of spark harness to populate S3 bucket and spark driver retrieving content from the S3 bucket besides additional operations like deleting the bucket. I noticed with my test-runs that with given s3 path the job was hanging for a long time.

Other changes include flake error fixes.