Open RomeLeader opened 4 years ago
We get a similar issue when a file is not in s3 and an empty DataFrame is still created, shouldn't this raise an exception?:
22/06/30 08:52:18 WARN HadoopDataSource: Skipping Partition {} as no new files detected @ s3://sample-bucket/test/dict_most_common_names_old.csv or path does not exist
Empty DataFrame
Columns: []
Index: []
<class 'pandas.core.frame.DataFrame'>
I experienced the same error. Turned out my glue job just did not have enough permissions. Thereby you may check your assigned role.
What permissions were you missing @MyJBMe ?
Hi,
My log bucket is fairly large in size, however we have Glaicered anything older than three months. When I run the job, I get the following, as it completes in a minute or two:
where is the name of my S3 access log storage bucket.
My logs are being saved at top-level in the S3 bucket, i.e. all log files are at s3:///
What could be happening here? I know there are logs in the bucket that are not partitioned, and the converted DB/tables are empty when I preview them. I have given the classification of the raw data table as CSV, but I am not sure what is correct.
Any pointers would be appreciated!