aws-samples / emr-serverless-samples

Example code for running Spark and Hive jobs on EMR Serverless.
https://aws.amazon.com/emr/serverless/
MIT No Attribution
150 stars 74 forks source link

[pyspark-dependencies] - DockerFile does not automatically move the tar.gz file to the local folder #34

Closed anselboero closed 2 years ago

anselboero commented 2 years ago

Hi, Trying to execute the first two command lines listed in the readme file docker build --output . .; aws s3 cp pyspark_ge.tar.gz s3://${S3_BUCKET}/artifacts/pyspark/

I noticed that the file _pysparkge.tar.gz was not loaded locally. I had to run the container related to the image built with the previous command and next the Docker cp command.

I was wondering if it was just my problem. If not I volunteer to update the documentation with the two additional commands I had to run. Thank you very much.

dacort commented 2 years ago

Hi @anselboero - a couple questions.

anselboero commented 2 years ago

Thank you for your prompt reply.

Server: Docker Engine - Community Engine: Version: 20.10.18 API version: 1.41 (minimum version 1.12) Go version: go1.18.6 Git commit: e42327a Built: Thu Sep 8 23:09:30 2022 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.8 GitCommit: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6 runc: Version: 1.1.4 GitCommit: v1.1.4-0-g5fd4c4d docker-init: Version: 0.19.0 GitCommit: de40ad0

dacort commented 2 years ago

Hm, ok, let me give it a try on Linux real quick. It should run fine.

dacort commented 2 years ago

Ah, think I figured it out. The custom build outputs feature requires the BuildKit backend.

You should be able to use either of the following commands. I'll add a note about this to the README.

docker buildx build --output . .
DOCKER_BUILDKIT=1 docker build --output . .
anselboero commented 2 years ago

great, it worked! Thank you very much for your help