Open taylorbarstow opened 3 years ago
UPDATE
I was able to successfully work around this by:
commons-collections4
as a dependency in pom.xml mvn install
mvn -f ${GLUE_ROOT}/pom.xml -DoutputDirectory=${SPARK_ROOT}/jars dependency:copy-dependencies
where GLUE_ROOT
is the root of this project, and SPARK_ROOT
is the root of my spark install If the maintainers think a fix within aws-glue-libs is warranted, I'd be happy to submit a PR. However I have a hunch that this is due to broken dependencies in the glue ETL jars, in which case this issue may simply go away once the upstream dependency issues are resolved.
Oh wow, thanks for the hint. The same issue is actually present in the aws glue docker image (safe to assume it is built from this repository) and I've been banging my head over it. I fixed it just by downloading Apache Commons Collections 4.4 website, unpacking and putting into jar repository.
# wget https://downloads.apache.org//commons/collections/binaries/commons-collections4-4.4-bin.zip
# unzip commons-collections4-4.4-bin.zip
# cp commons-collections4-4.4/commons-collections4-4.4.jar /home/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/jars/
restart docker image and voila, it works and I can finally work with my data.
I had a very similar issue using the AWS Glue docker container (glue 1.0). I couldn't load data from XML files using glueContext.create_dynamic_frame_from_options. I fixed it following @PPFilip steps to include Apache Commons Collections 4.4 in the jars. Restarted the docker image and it worked.
Thanks a lot @PPFilip and @taylorbarstow , you guys made my day.
I'm having the same issue but when using the resolveChoice method from DynamicDataframe.
df.resolveChoice(choice = "cast:string")
I'm trying to understand where I insert that jar.
I've recently started getting the following error when using
drop_fields
with aws-glue-libs via the aws_glue_libs docker image:Any hints or pointers on how to dig into this? Nothing has changed with the docker image, so my hunch is the issue stems from an upstream change in the glue ETL jars. I've tried adding commons-collections4 as a dependency in pom.xml and then running
mvn package
but that doesn't solve it.Any help or directional advice would be appreciated!