awslabs / python-deequ

Python API for Deequ
Apache License 2.0
691 stars 132 forks source link

Error encountered: Py4JJavaError in s3a #28

Closed nymango closed 3 years ago

nymango commented 3 years ago

I was going through the steps for the suggestions notebook and errored out in cell df = spark.read.parquet("s3a://amazon-reviews-pds/parquet/product_category=Electronics/"). I think the classpath is correct as I pasted in the code directly. The error is AccessDenied on the bucket

_Py4JJavaError: An error occurred while calling o43.parquet. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.nio.file.AccessDeniedException: s3a://amazon-reviews-pds/parquet/product_category=Electronics/part-00000-495c48e6-96d6-4650-aa65-3c36a3516ddd.c000.snappy.parquet: getFileStatus on s3a://amazon-reviews-pds/parquet/productcategory=Electronics/part-00000-495c48e6-96d6-4650-aa65-3c36a3516ddd.c000.snappy.parquet: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: PAD4YWD5G92K8CHD;

are there credentials that need to be set?

nymango commented 3 years ago

built new sagemaker classic notebook in VPC used role created by the console set security group to allow inbound port 9889 opened conda_python3 this issue didn't manifest again