[pyspark-dependencies] - DockerFile does not automatically move the tar.gz file to the local folder

anselboero commented 2 years ago

Hi, Trying to execute the first two command lines listed in the readme file docker build --output . .; aws s3 cp pyspark_ge.tar.gz s3://${S3_BUCKET}/artifacts/pyspark/

I noticed that the file _pysparkge.tar.gz was not loaded locally. I had to run the container related to the image built with the previous command and next the Docker cp command.

I was wondering if it was just my problem. If not I volunteer to update the documentation with the two additional commands I had to run. Thank you very much.

dacort commented 2 years ago

Hi @anselboero - a couple questions.

What's the output of your docker version command? The --output flag is only available as of API version 1.40+
Are you running your Docker host locally?

anselboero commented 2 years ago

Thank you for your prompt reply.

this is the output of the docker version command (I think the API version is 1.41)

Client: Docker Engine - Community
Version:           20.10.18
API version:       1.41
Go version:        go1.18.6
Git commit:        b40c2f6
Built:             Thu Sep  8 23:11:43 2022
OS/Arch:           linux/amd64
Context:           default
Experimental:      true

Server: Docker Engine - Community Engine: Version: 20.10.18 API version: 1.41 (minimum version 1.12) Go version: go1.18.6 Git commit: e42327a Built: Thu Sep 8 23:09:30 2022 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.8 GitCommit: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6 runc: Version: 1.1.4 GitCommit: v1.1.4-0-g5fd4c4d docker-init: Version: 0.19.0 GitCommit: de40ad0

Yes, I'm running Docker host locally on Ubuntu 22.04. I could try to replicate the steps using an EC2 Instance with Amazon Linux 2 as AMI (even though I thought we are using Docker in order to isolate the environment. I'm not a virtualization expert, sorry :) ) Thank you

dacort commented 2 years ago

Hm, ok, let me give it a try on Linux real quick. It should run fine.

dacort commented 2 years ago

Ah, think I figured it out. The custom build outputs feature requires the BuildKit backend.

You should be able to use either of the following commands. I'll add a note about this to the README.

docker buildx build --output . .

DOCKER_BUILDKIT=1 docker build --output . .

anselboero commented 2 years ago

great, it worked! Thank you very much for your help

aws-samples / emr-serverless-samples

[pyspark-dependencies] - DockerFile does not automatically move the tar.gz file to the local folder #34