We are testing version 6.0.0 of the tool using Glue 3.0 and have noticed that some access log data is being deduped when files are converted into hive/parquet format.
This is concerning because the Athena query output does not show which file was deleted. It appears the second s3 access log entry is overwriting the first when the file is converted.
We are testing version 6.0.0 of the tool using Glue 3.0 and have noticed that some access log data is being deduped when files are converted into hive/parquet format.
An example from our access logs are
The Athena query output is as follows
This is concerning because the Athena query output does not show which file was deleted. It appears the second s3 access log entry is overwriting the first when the file is converted.
Thank you for taking the time to look into this.