gabime / spdlog

Fast C++ logging library.
Other
24.5k stars 4.57k forks source link

Question on spdlog rotation size #3265

Closed dentiny closed 2 hours ago

dentiny commented 4 days ago

Hi team, I have a question on max size for rotating file sink: https://github.com/gabime/spdlog/blob/1245bf8e8a19d914271df03124da9593f3634a9c/include/spdlog/sinks/rotating_file_sink.h#L24-L28

My understand for the parameter is:

But after reading through the code, I found the implementation is:

I guess the reason is: if we sync every time when accumulated data size is larger than configured max size Example code:

if (accumulated_size >= max_size_) {
  fflush(data)
  sync()
}

we would have unintended and unexpected latency on critical path, since logging is expected to be cheap and low latency. So the design choice here is let library decide when to flush data to page cache, and let filesystem to decide when to persist to disk.

If I don't mis-understand, spdlog doesn't provide any guarantee for rotating size; and in theory each log file could be infinitely large. I feel like the best way to implement is to leverage background thread or async IO.

gabime commented 4 days ago

filehelper.size() > 0

This was added to deal with full disk cases #2261.

In theory I think your are you are right, although I didnt see any issue reports on this. Maybe there are other ways to fix #2261..

dentiny commented 4 days ago

In theory I think your are you are right, although I didnt see any issue reports on this.

Hi @gabime thanks for the reply!

I'm happy to provide more context. I have some issues when using ray, production issue see: https://github.com/ray-project/ray/issues/45678 The TLDR here is: people see log become too big in production, due the above issue I mentioned, that spdlog doesn't have a full control for overall log files size, people using spdlog as their logging library doesn't have a hard cap on disk usage upper limit.

As for the issue you've mention: https://github.com/gabime/spdlog/issues/2261 My proposed solution is:

My proposed solution would be:

Actually this is what glog does: https://github.com/google/glog/blob/6c5c692c8e423f651c74de9477ff0b5a59008bcc/src/logging.cc#L1090-L1148

Pseudocode looks like the following:

// Invoked right before function returns.
absl::Cleanup cleanup = delete_oldest_log_if_necessary

if cur_size >= max_size:
  file_handle = NULL
if file_handle == NULL:
  if (!create_new_file()):
    return

// Write content to log as normal

The caveat is:

gabime commented 4 days ago

people see log become too big in production

Can you point me to those issues? Seems that https://github.com/ray-project/ray/issues/45678 was about not using the rotating log at all.

dentiny commented 4 days ago

Can you point me to those issues? Seems that ray-project/ray#45678 was about not using the rotating log at all.

The TLDR is:

gabime commented 4 days ago

But users find these hard caps are not respected, in detail, the generated file size is larger than (max size * max file count).

It could be something else, I see there a discussion about env params and some log_monitor.py and branches to be merged..

Please provide a simple way to reproduce before we search for solutions.

dentiny commented 4 days ago

Hi @gabime , sorry for the confusion, I think for the issue raised in ray, there're two things:

The issue I want to discuss in this thread is the first item.

Assume you set the max size = 10, and file number = 1. What's the expected behavior if you log "helloworld" for twice?

My expectation is there's only one file exists on the disk, with content "helloworld" with size 10. But due to the implementation I detailed at the issue description, there's no guarantee what it would be.

gabime commented 3 days ago

Please provide code to reproduce. Otherwise those are just guesses.

dentiny commented 2 hours ago

Hi @gabime , I think there's a bug in ray, not any issue related to spdlog. Thank you very much for the help!