Closed archenroot closed 2 years ago
Could it be missing AWS java sdk bundle and hadoop aws dependency missing in container of glue? eg something like:
- PYSPARK_SUBMIT_ARGS=--packages com.amazonaws:aws-java-sdk-bundle:1.11.819,org.apache.hadoop:hadoop-aws:3.2.0 pyspark-shell
Above is from working Spark 3.0 jupyter and s3 integration via spark...thx for any kind of comments here.
I am closing it as found a way to replace core site.xml config
As part of establishment of local docker based env for development I made up and running minio s3 storage. My command line cli mc (provided by minio) and also some boto3 based scripts works just fine, but I cannot get working pyspark script as bellow:
I get this log:
Could it be some missing hadoop library? or any other hints?