awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
635 stars 300 forks source link

Cannot Read AWS Credentials #140

Closed D3tenney closed 2 years ago

D3tenney commented 2 years ago

I'm following along with this documentation: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html

Using

PROFILE_NAME=default
docker run -it \
-v ~/.aws:/home/glue_user/.aws \
-e AWS_PROFILE=$PROFILE_NAME \
-e DISABLE_SSL=true \
--rm \
-p 4040:4040 \
-p 18080:18080 \
--name glue_pyspark \
amazon/aws-glue-libs:glue_libs_3.0.0_image_01 \
pyspark

and when I run df = spark.read.csv('s3://<bucket>/<key>') I get:

WARN BasicProfileConfigFileLoader: Unable to load config file /home/glue_user/.aws/config
com.amazonaws.SdkClientException: Unable to load AWS credential profiles file at: /home/glue_user/.aws/config

I don't have permission to access my mounted credentials. Is this a common issue?

D3tenney commented 2 years ago

The credentials and config files do exist at /home/glue_user/.aws/. They show up when I run ls /home/glue_user/.aws/.

moomindani commented 2 years ago

The command is trying to use your local ~/.aws as /home/glue_user/.aws in the container. I would recommend you to verify if you have that folder in your local user's home directly.

D3tenney commented 2 years ago

I have .aws in my local home directory. Per my comment, the directory shows up as mounted, though I get a permissions error when I run cat /home/glue_user/.aws/credentials.

When I run ls -la /home/glue_user/.aws/, I see that the config and credentials files aren't owned by glue_user, but by 1000 and in group ssh_keys. Is that how they're supposed to be?

moomindani commented 2 years ago

Yes, you are right. At least you need to male the credentials file accessible from the container. Here's my example. I was able to read /home/glue_user/.aws/credentials from the container.

Host machine

$ ls -la ~/.aws/
total 16
drwxr-xr-x   5 sekiyama  staff   160 Aug 19 11:35 .
drwxr-xr-x+ 89 sekiyama  staff  2848 Sep  7 21:31 ..
-rw-------   1 sekiyama  staff  3028 Jul 19 20:55 config
-rw-------   1 sekiyama  staff   116 Nov 18  2021 credentials
...

Docker container

$ docker run -it -v ~/.aws:/home/glue_user/.aws -e AWS_PROFILE=$PROFILE_NAME -e DISABLE_SSL=true --rm -p 4040:4040 -p 18080:18080 --name glue_pyspark amazon/aws-glue-libs:glue_libs_3.0.0_image_01
starting org.apache.spark.deploy.history.HistoryServer, logging to /home/glue_user/spark/logs/spark-glue_user-org.apache.spark.deploy.history.HistoryServer-1-60ff2821cca0.out
[glue_user@60ff2821cca0 workspace]$ ls -la ~/.aws/
total 16
drwxr-xr-x 5 glue_user root  160 Aug 19 02:35 .
drwx------ 1 glue_user root 4096 Sep  8 05:00 ..
-rw------- 1 glue_user root 3028 Jul 19 11:55 config
-rw------- 1 glue_user root  116 Nov 18  2021 credentials
...

[glue_user@dfe045837017 workspace]$ id uid=10000(glue_user) gid=0(root) groups=0(root)



The permission error you got looks like general Docker volume issue (not specific to Glue's Docker image). There can be multiple options to solve it.
Since `glue_user` exists in the container, one of the possible options is to create a new user in the host machine with the same uid with the container's `glue_user` user and then use this user's directory.