Open JeevaTM opened 2 years ago
I hit the same issue. It seems to be related to Class loader for Hive to load credential class. Anyway, I cannot fix it without code, but I do find a workaround:
Create core-site.xml in aws-glue-libs/conf/, replace your AKSK in it.
<configuration>
<property>
<name>fs.s3.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
<description>s3a filesystem implementation</description>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>AK</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>SK</value>
</property>
</configuration>
Hopefully it works for you.
@JeevaTM I am facing the same problem. Did you find a resolution? I will investigate @xmubeta solution but not sure yet how to do that.
@labbedaine I used to following Dockerfile content to work with Iceberg tables in Glue. It works pretty well
FROM amazon/aws-glue-libs:glue_libs_3.0.0_image_01 ENV AWS_ACCESS_KEY_ID="redacted" ENV AWS_SECRET_ACCESS_KEY="redacted" ENV DISABLE_SSL=true ENV AWS_CA_BUNDLE="/etc/pki/ca-trust/source/anchors/ca-bundle.pem"
USER 0
COPY . /home/glue_user/workspace/
COPY cred/ca-bundle.pem /etc/pki/ca-trust/source/anchors/ RUN update-ca-trust
COPY cred/zscaler.crt /home/glue_user/ WORKDIR /usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64/jre/lib/security/ RUN keytool -import -keystore ./cacerts -trustcacerts -file /home/glue_user/zscaler.crt -storepass changeit -noprompt
WORKDIR /home/glue_user/spark/jars/ RUN wget https://repo1.maven.org/maven2/software/amazon/awssdk/bundle/2.15.40/bundle-2.15.40.jar RUN wget https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark3-runtime/0.12.0/iceberg-spark3-runtime-0.12.0.jar RUN wget https://repo1.maven.org/maven2/software/amazon/awssdk/url-connection-client/2.15.40/url-connection-client-2.15.40.jar
WORKDIR /home/glue_user/aws-glue-libs/awsglue/scripts/ RUN cat /home/glue_user/workspace/cred/ca-bundle.pem >> /home/glue_user/.local/lib/python3.7/site-packages/certifi/cacert.pem RUN cat /home/glue_user/workspace/cred/ca-bundle.pem >> /usr/local/lib/python3.7/site-packages/certifi/cacert.pem
RUN chown -R glue_user /home/glue_user/
WORKDIR /home/glue_user/workspace/
USER 10000
cabundle.pem is Certificate authority and zscaler is my corporate proxy certificate. Update if required.
The jar file version on wget is necessary to work with Icebergs. Unfortunately, you cannot work with Iceberg 0.13.x with the Glue image, it comes with Spark 3.2 but 3.3 is required so if you write to Iceberg from this Image columns will be missing in Glue data catalog, and Athena preview. Quicksight cannot read the Iceberg even with 'query using athena option'.
All of the above goes for AWS Glue as well, not just the image.
I would divert you to use EMR instead, which comes with 0.13.x iceberg support and such.
I want to set local AWS Glue development environment. I created Glue Catalog Database.
I created an IAM user with full S3, Glue, Athena access. Granted the IAM user with super permission for Glue Catalog database in AWS Lake Formation.
I started a Glue notebook and ran the following script:
It successfully displayed the databases and created 'jeeva_demo' table in 'db' Glue Catalog database.
Now I wanted to setup same Glue environment locally to start working on it.
I pulled the docker image
amazon/aws-glue-libs:glue_libs_3.0.0_image_01
Created a local folder
glue
. My test scripts reside inglue
folder.I ran the following command to start the Glue image
I am sitting behind organization firewall so I update Java keystore with
zscaler.crt
and didupdate-ca-trust
withca-bundle.pem
.I ran the same script I ran on AWS Glue locally
The variable
df
on debug showed the databases I gave IAM user permission in AWS Lake Formation.Script would run until it is on CREATE TABLE line and throws the error.
I assumed this was error due to AWS Java SDK - S3 missing so I added
spark.jars.packages com.amazonaws:aws-java-sdk:1.11.1000
in /home/glue_user/spark/conf/spark-defaults.conf and ran the python script again.Spark did download the dependencies but I still have the same error. What am I missing here?
Full trace: