Closed newpavlov closed 12 months ago
I think you ran through the following within one sec:
Rcurrent.log ——- rotate: Ts1.log ——- compress: Ts1.log.gz ——- next log line is written: Rcurrent.log Ts1.log.gz ——- rotate again: Ts1.log Ts1.log.gz ——- compress - boom Ts1.log.gz
This wasn‘t possible without the explicitly triggered rotation. Fix should not be difficult.
I am not sure I follow. So during the first rotate Ts1
is timestamp of rotation and for the second one timestamp of the first log line? Why such discrepancy?
It does not look like the issue gets triggered for further rotation as long as the app does not get restarted.
I think I will try the TimestampsDirect
naming to work around this issue.
Ts1 is meant to be in all cases the timestamp of the infix in the filename. It has precision of second, that‘s why the collision can happen if all actions are done in a single second.
Collisions during rotate are detected and escaped with adding restartXXX to the infix. The same is missing in the compress step.
Yes, lack of collision detection during compression is an issue, you probably should use something like NamedTempFile::persist_noclobber
. But there is a different issue with timestamps used to name rotatef files. Note that between manual rotations passes more than one second.
Let's say I have app_rCURRENT.log
created at 18:25 (i.e. it's creation/birth time, not access or modification time) and nothing more. On 18:30 I launch my app and pass "rotate" command after a bunch of logs were written. Now I have two log files: current
and 18-25-xx.log.gz
. The current file has creation time 18:30.
After 5 minutes during which a bunch of logs were written into the current file I execute rotation command again. I would expect to get 3 files: current
, 18-30-yy.log.gz
, and 18-25-xx.log.gz
, but I get only two: current
and 18-25-xx.log.gz
, with data which should've been saved into 18-30-y.log.gz
overwriting 18-25-xx.log.gz
.
The third manual rotation at 18:40 which contains logs from 18:35 to 18:40 gets rotated with name 18:30-zz.log.gz
.
So it looks like the code chooses timestamp for rotated files incorrectly. The first rotation after cold start uses "current" file creation timestamp, while the subsequent ones use incorrectly cached values.
The issue may not be even specific to manual rotation, it's just the easiest way to trigger it.
I think the issue is caused by this line. You use creation date of rotated file for naming next rotated file. It's wrong by itself as can be seen in my previous comment (you should use current time instead), but it causes name collision when appending is enabled. During initialization you use creation date of current file, which inevitable becomes creation date of next rotation file since you use fs::rename
which IIUC preserves creation date.
Issues are now hopefully all fixed with 0.27.1.
Thanks for reporting and the discussions!
As per my two previous comments, I don't think this issue is fully fixed. Yes, v0.27.1 no longer overwrites previously rotated log files on compression, but it still uses arguably incorrect timestamps for naming rotated files.
After rotating twice today with existing current file created two days ago I got the following files:
app_r2023-09-25_15-20-38.log.gz 2.46 KiB
app_r2023-09-25_15-20-38.restart-0000.log.gz 344 B
The second file contains records only from 2023-09-27 16:17:42 to 2023-09-27 16:17:48.
Sorry for not replying to this part directly. I was distracted by testing and resolving the naming conflicts that also existed. I Indeed I did not understand your point with the second rotation, your last comment changed that. It should be solved now with 0.27.2.
Thank you! I confirm that 0.27.2 has fixed the issue.
I have an app which allow to remotely manage logs. It uses
Cleanup::KeepCompressedFiles
, i.e. it compresses logs immediately after rotation. It has commands to list, rotate, and copy (rotated) logs.After cold start of the application with existing logs, the list command produces the following output:
Next, the rotate command is executed with the following result (it returns only rotated and compresses logs):
Listing logs result in expected result:
So far so good. But if I manually rotate again (more than one second later), the last 2 KB log file gets replaced with new data:
Parts of relevant code:
Rotating logs after that without restarting the application works without issues.
I can not provide a minimal reproduction of the issue right now. Hopefully, you will be able to resolve the issue with provided information.