Closed D3tenney closed 2 years ago
The credentials and config files do exist at /home/glue_user/.aws/
. They show up when I run ls /home/glue_user/.aws/
.
The command is trying to use your local ~/.aws
as /home/glue_user/.aws
in the container.
I would recommend you to verify if you have that folder in your local user's home directly.
I have .aws
in my local home directory. Per my comment, the directory shows up as mounted, though I get a permissions error when I run cat /home/glue_user/.aws/credentials
.
When I run ls -la /home/glue_user/.aws/
, I see that the config and credentials files aren't owned by glue_user
, but by 1000
and in group ssh_keys
. Is that how they're supposed to be?
Yes, you are right. At least you need to male the credentials file accessible from the container.
Here's my example. I was able to read /home/glue_user/.aws/credentials
from the container.
.aws
directory in local home directory.$ ls -la ~/.aws/
total 16
drwxr-xr-x 5 sekiyama staff 160 Aug 19 11:35 .
drwxr-xr-x+ 89 sekiyama staff 2848 Sep 7 21:31 ..
-rw------- 1 sekiyama staff 3028 Jul 19 20:55 config
-rw------- 1 sekiyama staff 116 Nov 18 2021 credentials
...
pyspark
and then running ls -la
command inside the container, local host machine's ~/.aws/
directory is mounted as container's /home/glue_user/.aws/
.$ docker run -it -v ~/.aws:/home/glue_user/.aws -e AWS_PROFILE=$PROFILE_NAME -e DISABLE_SSL=true --rm -p 4040:4040 -p 18080:18080 --name glue_pyspark amazon/aws-glue-libs:glue_libs_3.0.0_image_01
starting org.apache.spark.deploy.history.HistoryServer, logging to /home/glue_user/spark/logs/spark-glue_user-org.apache.spark.deploy.history.HistoryServer-1-60ff2821cca0.out
[glue_user@60ff2821cca0 workspace]$ ls -la ~/.aws/
total 16
drwxr-xr-x 5 glue_user root 160 Aug 19 02:35 .
drwx------ 1 glue_user root 4096 Sep 8 05:00 ..
-rw------- 1 glue_user root 3028 Jul 19 11:55 config
-rw------- 1 glue_user root 116 Nov 18 2021 credentials
...
pyspark
and then running cat
command inside the container, it returns permission error.
[glue_user@60ff2821cca0 workspace] $ cat /home/glue_user/.aws/credentials
[default]
aws_access_key_id = AKIAXXXXXXXXXX
aws_secret_access_key = YYYYYYYYYY
[glue_user@dfe045837017 workspace]$ id uid=10000(glue_user) gid=0(root) groups=0(root)
The permission error you got looks like general Docker volume issue (not specific to Glue's Docker image). There can be multiple options to solve it.
Since `glue_user` exists in the container, one of the possible options is to create a new user in the host machine with the same uid with the container's `glue_user` user and then use this user's directory.
I'm following along with this documentation: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html
Using
and when I run
df = spark.read.csv('s3://<bucket>/<key>')
I get:I don't have permission to access my mounted credentials. Is this a common issue?