Closed JeevaTM closed 2 years ago
Hi @JeevaTM thank you for reaching out.
It looks like there's two issues in the environment:
EC2 Instance Metadata Service is disabled
and
Class class com.amazonaws.auth.DefaultAWSCredentialsProviderChain does not implement AWSCredentialsProvider
.
Both seem to be related to Spark/Hadoop FileSystem environment misconfigurations, so the best chance of understanding and fixing these issues is to reach out to Hadoop support.
@JeevaTM I am also experiencing the same issue. Did you find a workaround?
Caused by: MetaException(message:Got exception: java.io.IOException Class class com.amazonaws.auth.DefaultAWSCredentialsProviderChain does not implement AWSCredentialsProvider) at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1390) at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:107)
@labbedaine I resolved the authentication issues but regarding creating Iceberg tables from Glue docker or AWS Glue service itself is not recommended. Because AWS Glue's spark is old, the Iceberg connector will fall back to 0.12.0, thus creating Iceberg v1 tables.
These Iceberg v1 tables are generally not compatible most of AWS services. AWS Glue data catalog does not show columns, AWS athena does not show column preview, AWS quicksight cannot query the Iceberg table created by AWS Glue.
I found that it works okay not break stuff if you create Iceberg table from AWS athena and Use Upserts from AWS EMR 6.7 (latest one) Regardless, here's my dockerfile content for local Glue image with downloaded Iceberg jar files. Set AWS_ACCESS_KEY AND AWS_SECRET_KEY and you are good to go
FROM amazon/aws-glue-libs:glue_libs_3.0.0_image_01
ENV AWS_REGION="us-east-1" ENV DISABLE_SSL=true ENV AWS_CA_BUNDLE="/etc/pki/ca-trust/source/anchors/ca-bundle.pem" #this is my organisation's certificate authority bundle ENV AWS_ACCESS_KEY_ID='' ENV AWS_SECRET_KEY_ID=''
USER 0
COPY . /home/glue_user/workspace/ #copying current directory which has pyspark scripts to run in glue container
COPY cred/ca-bundle.pem /etc/pki/ca-trust/source/anchors/ RUN update-ca-trust
COPY cred/zscaler.crt /home/glue_user/ #my organisation's proxy certificate WORKDIR /usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64/jre/lib/security/ RUN keytool -import -keystore ./cacerts -trustcacerts -file /home/glue_user/zscaler.crt -storepass changeit -noprompt
WORKDIR /home/glue_user/spark/jars/ RUN wget https://repo1.maven.org/maven2/software/amazon/awssdk/bundle/2.15.40/bundle-2.15.40.jar RUN wget https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark3-runtime/0.12.0/iceberg-spark3-runtime-0.12.0.jar RUN wget https://repo1.maven.org/maven2/software/amazon/awssdk/url-connection-client/2.15.40/url-connection-client-2.15.40.jar
WORKDIR /home/glue_user #I wa testing AWS SSO login instead of configured environment variable RUN curl https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip -o awscliv2.zip RUN unzip awscliv2.zip RUN ./aws/install
WORKDIR /home/glue_user/aws-glue-libs/awsglue/scripts/ RUN cat /home/glue_user/workspace/cred/ca-bundle.pem >> /home/glue_user/.local/lib/python3.7/site-packages/certifi/cacert.pem RUN cat /home/glue_user/workspace/cred/ca-bundle.pem >> /usr/local/lib/python3.7/site-packages/certifi/cacert.pem
COPY aws /home/glue_user/.aws/ RUN cp -r /home/glue_user/.aws/ /root/.aws/
RUN chown -R glue_user /home/glue_user/
WORKDIR /home/glue_user/workspace/ USER 10000 #this user is glue_user
How did you solve the authentication issue @JeevaTM
Describe the bug
I am trying create a Glue Catalog Table using AWS Glue docker image with pyspark
I am using the following pyspark script to first view the database and create the table next.
Works perfectly fine on AWS environment. Running same in local docker image, I was only able to view database. Create table statement failed.
Error:
DefaultAWSCredentialsProviderChain extended AWSCredentialsProviderChain which implemented AWSCredentialsProvider
Expected Behavior
Create Glue Catalog table on Spark SQL from AWS Glue Docker image
Current Behavior
Stack trace:
Reproduction Steps
Create an IAM user with necessary S3, Glue, Athena access. Create a Glue Catalog Database. Go to AWS Lake Formation, grant create table access to the created IAM user.
Configure AWS credentials in environment.
Pull the AWS Glue Docker image
amazon/aws-glue-libs:glue_libs_3.0.0_image_01
and spin up the container withRun the following pyspark script:
Possible Solution
No response
Additional Information/Context
Local AWS Glue environment setup
AWS Java SDK version used
Default SDK that comes with Glue image and also 1.11.1000
JDK version used
openjdk version "1.8.0_322" OpenJDK Runtime Environment Corretto-8.322.06.3 (build 1.8.0_322-b06) OpenJDK 64-Bit Server VM Corretto-8.322.06.3 (build 25.322-b06, mixed mode)
Operating System and version
Docker Desktop - Windows