emabee / flexi_logger

A flexible logger for rust programs that can write to stderr, stdout, and/or to log files
Apache License 2.0
307 stars 50 forks source link

Rotation may overwrite existing log file #150

Closed newpavlov closed 12 months ago

newpavlov commented 1 year ago

I have an app which allow to remotely manage logs. It uses Cleanup::KeepCompressedFiles, i.e. it compresses logs immediately after rotation. It has commands to list, rotate, and copy (rotated) logs.

After cold start of the application with existing logs, the list command produces the following output:

$ log list
app_r2023-09-22_16-41-41.log.gz 993 B
app_r2023-09-22_16-41-49.log.gz 349 B
app_r2023-09-22_16-42-44.log.gz 381 B
app_r2023-09-22_16-43-10.log.gz 383 B
app_rCURRENT.log        5165 B

Next, the rotate command is executed with the following result (it returns only rotated and compresses logs):

$ log rotate
app_r2023-09-22_16-41-41.log.gz 993 B
app_r2023-09-22_16-41-49.log.gz 349 B
app_r2023-09-22_16-42-44.log.gz 381 B
app_r2023-09-22_16-43-10.log.gz 383 B
app_r2023-09-22_16-43-38.log.gz 2026 B

Listing logs result in expected result:

$ log list
app_r2023-09-22_16-41-41.log.gz 993 B
app_r2023-09-22_16-41-49.log.gz 349 B
app_r2023-09-22_16-42-44.log.gz 381 B
app_r2023-09-22_16-43-10.log.gz 383 B
app_r2023-09-22_16-43-38.log.gz 2026 B
app_rCURRENT.log        582 B

So far so good. But if I manually rotate again (more than one second later), the last 2 KB log file gets replaced with new data:

$ log rotate
app_r2023-09-22_16-41-41.log.gz 993 B
app_r2023-09-22_16-41-49.log.gz 349 B
app_r2023-09-22_16-42-44.log.gz 381 B
app_r2023-09-22_16-43-10.log.gz 383 B
app_r2023-09-22_16-43-38.log.gz 398 B

$ log list
app_r2023-09-22_16-41-41.log.gz 993 B
app_r2023-09-22_16-41-49.log.gz 349 B
app_r2023-09-22_16-42-44.log.gz 381 B
app_r2023-09-22_16-43-10.log.gz 383 B
app_r2023-09-22_16-43-38.log.gz 398 B
app_rCURRENT.log        581 B

Parts of relevant code:

fn log_list_inner(&self) -> Result<Vec<(String, u64)>, Error> {
    let selector = LogfileSelector::default()
        .with_compressed_files()
        .with_r_current();
    self.logger
        .existing_log_files(&selector)?
        .into_iter()
        .map(path2name_size)
        .collect()
}

fn log_rotate_inner(&self) -> Result<Vec<(String, u64)>, Error> {
    self.logger.flush();
    self.logger.trigger_rotation()?;
    let mut res = Vec::new();
    let selector = LogfileSelector::default().with_compressed_files();
    for _ in 0..ROTATE_SLEEP_NUM {
        std::thread::sleep(ROTATE_SLEEP);
        res = self.logger.existing_log_files(&selector)?;
        if res.iter().all(is_log_gz) {
            break;
        }
    }
    res.into_iter().map(path2name_size).collect()
}

Rotating logs after that without restarting the application works without issues.

I can not provide a minimal reproduction of the issue right now. Hopefully, you will be able to resolve the issue with provided information.

emabee commented 1 year ago

I think you ran through the following within one sec:

Rcurrent.log ——- rotate: Ts1.log ——- compress: Ts1.log.gz ——- next log line is written: Rcurrent.log Ts1.log.gz ——- rotate again: Ts1.log Ts1.log.gz ——- compress - boom Ts1.log.gz

This wasn‘t possible without the explicitly triggered rotation. Fix should not be difficult.

newpavlov commented 1 year ago

I am not sure I follow. So during the first rotate Ts1 is timestamp of rotation and for the second one timestamp of the first log line? Why such discrepancy?

It does not look like the issue gets triggered for further rotation as long as the app does not get restarted.

I think I will try the TimestampsDirect naming to work around this issue.

emabee commented 1 year ago

Ts1 is meant to be in all cases the timestamp of the infix in the filename. It has precision of second, that‘s why the collision can happen if all actions are done in a single second.

Collisions during rotate are detected and escaped with adding restartXXX to the infix. The same is missing in the compress step.

newpavlov commented 1 year ago

Yes, lack of collision detection during compression is an issue, you probably should use something like NamedTempFile::persist_noclobber. But there is a different issue with timestamps used to name rotatef files. Note that between manual rotations passes more than one second.

Let's say I have app_rCURRENT.log created at 18:25 (i.e. it's creation/birth time, not access or modification time) and nothing more. On 18:30 I launch my app and pass "rotate" command after a bunch of logs were written. Now I have two log files: current and 18-25-xx.log.gz. The current file has creation time 18:30.

After 5 minutes during which a bunch of logs were written into the current file I execute rotation command again. I would expect to get 3 files: current, 18-30-yy.log.gz, and 18-25-xx.log.gz, but I get only two: current and 18-25-xx.log.gz, with data which should've been saved into 18-30-y.log.gz overwriting 18-25-xx.log.gz.

The third manual rotation at 18:40 which contains logs from 18:35 to 18:40 gets rotated with name 18:30-zz.log.gz.

So it looks like the code chooses timestamp for rotated files incorrectly. The first rotation after cold start uses "current" file creation timestamp, while the subsequent ones use incorrectly cached values.

The issue may not be even specific to manual rotation, it's just the easiest way to trigger it.

newpavlov commented 12 months ago

I think the issue is caused by this line. You use creation date of rotated file for naming next rotated file. It's wrong by itself as can be seen in my previous comment (you should use current time instead), but it causes name collision when appending is enabled. During initialization you use creation date of current file, which inevitable becomes creation date of next rotation file since you use fs::rename which IIUC preserves creation date.

emabee commented 12 months ago

Issues are now hopefully all fixed with 0.27.1.

Thanks for reporting and the discussions!

newpavlov commented 12 months ago

As per my two previous comments, I don't think this issue is fully fixed. Yes, v0.27.1 no longer overwrites previously rotated log files on compression, but it still uses arguably incorrect timestamps for naming rotated files.

After rotating twice today with existing current file created two days ago I got the following files:

app_r2023-09-25_15-20-38.log.gz 2.46 KiB
app_r2023-09-25_15-20-38.restart-0000.log.gz    344 B

The second file contains records only from 2023-09-27 16:17:42 to 2023-09-27 16:17:48.

emabee commented 12 months ago

Sorry for not replying to this part directly. I was distracted by testing and resolving the naming conflicts that also existed. I Indeed I did not understand your point with the second rotation, your last comment changed that. It should be solved now with 0.27.2.

newpavlov commented 12 months ago

Thank you! I confirm that 0.27.2 has fixed the issue.