Yelp / mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services
http://packages.python.org/mrjob/
Other
2.62k stars 586 forks source link

Docker support on EMR #2184

Closed coyotemarin closed 4 years ago

coyotemarin commented 4 years ago

Adds support for running Spark executors in Docker containers on EMR. It's activated by setting the docker_image option. Also added the docker_mounts option, and the docker_client_config option (so you can log into ECR). Fixes #2179.

If you want to test this out without building a Docker image, try python -m mrjob.examples.mr_spark_wordcount --docker-image amazoncorretto:8 on Python 2.

88manpreet commented 4 years ago

Looks good to me. Left some questions. +1 for merging this.