Open danipilze opened 1 year ago
Hi @danipilze, the Amazon reviews dataset has indeed been removed and will no longer be available. We have generated a synthetic reviews dataset and are in the process of updating PyDeequ blogs and tutorials with it.
Please use the following link to the S3 s3a://aws-bigdata-blog/generated_synthetic_reviews/data/product_category=Electronics/ to access the data and follow the tutorial.
thanks @komashk that's pretty helpful to give my classes about Profiling with Deequ!
Hi
I saw you updated the links in the tutorial few weeks ago, but now I'm getting again this error from SageMaker
thanks
Caused by: java.nio.file.AccessDeniedException: s3a://aws-bigdata-blog/generated_synthetic_reviews/data/product_category=Electronics/4a0890eb4878486da735e7d091da28fc_0.snappy.parquet: getFileStatus on s3a://aws-bigdata-blog/generated_synthetic_reviews/data/product_category=Electronics/4a0890eb4878486da735e7d091da28fc_0.snappy.parquet: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: JR7DK1YB78DDDFQK; S3 Extended Request ID: ac6w1k2llf5qZxQ+xDDnZryhEcp5ftMtvzTv8cjZnXVcvL9B4e4Ipf2NKpJNpjeDvLqADlG6Pxs=; Proxy: null), S3 Extended Request ID: ac6w1k2llf5qZxQ+xDDnZryhEcp5ftMtvzTv8cjZnXVcvL9B4e4Ipf2NKpJNpjeDvLqADlG6Pxs=:403 Forbidden
@danielfrai Thank you for letting us know! Looking into it.
@danielfrai I confirm that the data is accessible for me (tested in Glue/downloaded via CLI). How are you trying to access it?
thanks for checking @komashk I was trying from SageMaker, but now I do have access so no problem now
Describe the bug Tutorial dataset
amazon-reviews-pds
it's not longer available, according to this Reddit thread it has been removed https://www.reddit.com/r/dataengineering/comments/15ohj6q/trouble_accessing_the_amazon_reviews_dataset_in/To Reproduce Steps to reproduce the behavior:
Expected behavior Read the dataframe without issues
Environment: Amazon SageMaker Notebook instances Jupyter notebook