Open PettitWesley opened 1 year ago
As noted here, there are potentially two bugs/needed enhancements, and the second is to support non-UTC timestamp for S3: https://github.com/aws/aws-for-fluent-bit/issues/432#issuecomment-1309413465
@PettitWesley I assume this is still a known issue and being worked on and/or tracked? We just saw this on a cluster which has a massive amount of pods (41). The folder we wrote to within S3 had the correct date but the timestamp of the gzip file itself was from yesterday: contents within however are from today.
Would it make sense to update the "last modified date" of the gzip file just prior to uploading?
https://github.com/aws/aws-for-fluent-bit/releases/tag/v2.31.3 @cdancy 2.31.3 has that feature, but then we removed it because a recent change in S3 (possibly that one or possibly another) added instability, so we reverted all recent S3 changes.
All of the S3 fixes will come back soonish, once I complete the S3 stability refactor (code complete and tested but one pending core change to enable it) : https://github.com/PettitWesley/fluent-bit/pull/24
@PettitWesley that's for getting back. We'll keep following for now. Not a show-stopper but something we noticed trying to debug logs that left us scratching our heads wondering if we were sane or not :)
I'm not sure that the bug fix for https://github.com/aws/aws-for-fluent-bit/issues/459#issuecomment-1622111965 above will resolve the issue
We are using fluentbit to write logs to s3 and then using Athena partitioning to query the logs
eg
A file written to s3 with path year=2023/month=10/day=04/hour=03/somefile.gz
SELECT * FROM servicelogs WHERE year = 2023 and month = 10 and day = 4 and hour = 3;
If the file receives its s3 prefix from the time of the first log, this log could contain records from hour 4.
Ideally fluentbit would cutover to a new file at the partition change.