Delgan / loguru

Python logging made (stupidly) simple
MIT License
19.51k stars 690 forks source link

Does the compression task block the main thread? #243

Closed mahadi closed 4 years ago

mahadi commented 4 years ago

Hi, i have a program running with several threads. Each of them is using queues to buffer its work load while another thread is running. In the main thread my main engine runs and uses the queues to distribute and forward work. Also in the main module the logger is configured with a custom log rotation function. (See #241).

I am monitoring the queue loads and i can see that during the time when log compression happens (after 1 GiB with bz2 - takes ~ 2.5 minutes) my queues only fill up but can not be worked on in my main thread during that time.

So i thought about putting my engine also in a separate thread. But actually logging should be the part which runs in a separate thread.

Can you tell me how this is managed in loguru? Does the log processing run in a separate thread?

I guess the problem is related to the compression itself. It is running synchronosly...

Delgan commented 4 years ago

Yes, compression takes place in the same thread as the messages are logged. If you configure your sink with enqueue=True, messages and compression will both be handled in a separate thread. Would that solve your problem?

mahadi commented 4 years ago

I will give it a try.

mahadi commented 4 years ago

Unfortunately it didn't work. I will put more invesitgation on my code. The engine already runs in a separate thread...

Delgan commented 4 years ago

Hum... I regards to a few tests and it seems to work as expected, according to what I said about enqueue.

from loguru import logger
import sys
import time

def compression(filepath):
    time.sleep(5)

def rotation(message, file):
    return message.record["extra"].get("rotate", False)

logger.add("test.log", compression=compression, rotation=rotation, enqueue=False)

logger.info("Before")
logger.bind(rotate=True).info("...")
logger.info("After")

If enqueue=False, there is a five seconds gap for logs displayed on sys.stderr:

2020-04-14 19:09:44.863 | INFO     | __main__:<module>:14 - Before
2020-04-14 19:09:44.863 | INFO     | __main__:<module>:15 - ...
2020-04-14 19:09:49.868 | INFO     | __main__:<module>:16 - After

If enqueue=True, there is no gap, main thread is not blocked:

2020-04-14 19:10:06.004 | INFO     | __main__:<module>:14 - Before
2020-04-14 19:10:06.004 | INFO     | __main__:<module>:15 - ...
2020-04-14 19:10:06.005 | INFO     | __main__:<module>:16 - After

However, one thing to note that may explain your problem, is that if you call logger.stop() it will block the main thread until handler is properly terminated (which implies waiting for the compression to complete), even if enqueue=True.

mahadi commented 4 years ago

I am still digging... A blocking of the main thread should actually not be a problem because the engine is also running in a separate thread. What i could see is that the threads which utilize a network connection seem to be somehow unresponsive. The network request they do seem to be failing but nothing gets logged (normally a failing request would be logged). But my internal monitoring thread is still producing data...

I dont understand how the network stuff is related to the compression in this case...

Delgan commented 4 years ago

Is your program using multiprocessing somehow? Are you using the latest version of Loguru, v0.4.1?

What you describe looks like what would happen if the others threads were blocked while one thread execute compression. This would be totally possible, because each sink is protected by a Lock, so a thread can't write its message if the other one hasn't completely finished (including the time to compress). However, ifenqueue=True this theoretically can't happen, because messages are queued and processed in another thread: https://github.com/Delgan/loguru/blob/e2a337a40d1fb627d88cbe05d61b733b0ff1c92e/loguru/_handler.py#L169-L180

mahadi commented 4 years ago

Multiproceasing is not used, only threads. I am still wondering how this can happen. I can't work on this again around the weekend. I will try to instrument the code to get more insights.

mahadi commented 4 years ago

One question:

Do i need to put enqueue=True to both logging sinks? Currently i have it only on the one for the file sink.

    logger.add(sys.stdout,
               level="INFO",
               format=LOG_FORMAT)
    logger.add(logfile_path,
               level="INFO",
               format=LOG_FORMAT,
               rotation=rotator.should_rotate,
               # compression="bz2",
               enqueue=True)
Delgan commented 4 years ago

You don't really need to. The main purpose of enqueue is to protect your sink from being called at the same time by multiple processes, resulting in garbled logs. This parameter can also be useful to prevent logging calls to be blocking, as in your case. Since you don't use multiprocessing, logging to sys.stdout should be safe without having to use enqueue=True.

mahadi commented 4 years ago

I just observed the behaviour "live": During the compression everything seems to be very very slow. But my threads dont totally stop. Some of them seem to get scheduled pretty delayed and i am getting timeout errors for some of the network requests. After the compression everything seems to be normal again. As python doesn't have thread priorization, i could imagine that the compression threads eats up the major part of the CPU power assigned to my python process.

Delgan commented 4 years ago

Yeah, that's a possibility. Maybe you can work around that by using a custom compression function and compress your file in another spawned process. :thinking:

mahadi commented 4 years ago

Yes, i also tought of something like that... I will let you know my solution when it is implemented.

mahadi commented 4 years ago

Tried the following:

def compress_in_subprocess(filepath):
    subprocess.Popen(f"bzip2 -c {filepath} > {filepath}.bz2", shell=True)

logger.add(logfile_path,
    level="INFO",
    format=LOG_FORMAT,
    rotation=rotator.should_rotate,
    compression=compress_in_subprocess,
    enqueue=True  # should actually not be necessary anymore
)

What i now experience is, that due to the async process call it can happen that the rotation takes place before the actual execution of the compression, so i will get a very small compressed file and a rotated file with the original to be rotated file size...

mahadi commented 4 years ago

I guess i need to setup a separate compression process (perhaps a crontab) only for the already rotated files.

Delgan commented 4 years ago

Thanks for the report. I hadn't thought of that problem.

In this case, I consider that loguru needs to be improved. The file passed to the compression function should be "final". If a renaming is needed (to avoid conflicts), it should happen before the compression. I'm going to change loguru in that direction.

Delgan commented 4 years ago

I slightly refactored the handling compression. It should be safe to run any custom compression function in a different Thread or Process since v0.5.0. :+1:

If you have any other problem, feel free to reopen this issue. :)

mahadi commented 4 years ago

Tried it and it worked :) Thanks a lot for your efforts!

mahadi commented 4 years ago

Unfortunately i already observed 3 times that the main program hangs up upon start of the compression and could be only ended by kill -9. In contrast to that i was also running a test for 72 h w/o any problem...

The warning seems not to be issued as it is not present in the log file. Renaming of the logfile has been performed.

This is my current code


    def compress_in_subprocess(filepath):
        logger.warning(f"Compression takes place on '{filepath}'")
        # need to run in separate subprocess as it otherwise would block the current python process' ressources
        subprocess.Popen(f"bzip2 {filepath}", shell=True)
    # ....
    logger.add(logfile_path,
               level="INFO",
               format=LOG_FORMAT,
               rotation=rotator.should_rotate,
               compression=compress_in_subprocess,
               enqueue=True)`
mahadi commented 4 years ago

@Delgan: I guess the issue needs to be reopened

Delgan commented 4 years ago

@mahadi It looks like a deadlock due to the usage of logger inside the compression function.

Here is a small snippet to easily reproduce the problem:

import time
from loguru import logger

def compress(f):
    logger.warning("Compression")

logger.add("foo.log", rotation=lambda _, __: True, compression=compress, enqueue=True)

for i in range(100000):
    logger.info("Not hanging yet... {}", i)
    time.sleep(0.001)

What happens is that each handler uses a Lock to guarantee thread-safety. The lock is acquired before sending the log message to the sink and released once it has been processed. The compression process is protected by the lock, too. Consequently, if one tries to use the logger inside the compression function, it will try to acquire a lock already acquired and hang.

The bug happens systematically if enqueue=False. On the other hand, if enqueue=True the function sometimes has enough time to finish before the next message is processed by the queue worker, so the bug is more sporadic.

This is the same problem as if someone tries to use the logger inside a sink. One has to be careful not to do that.

mahadi commented 4 years ago

Ok, understood. Didnt think about this. Perhaps it would be good if this would be stated in the documentation to prevent others from doing the same mistake.

Delgan commented 4 years ago

Yes, there is a warning about using the logger inside a sink but I didn't think it could happen with compression / rotatation / retention either. Ideally, it shouldn't hang, because we'll often tend to forget this detail, but I haven't found a solution.

Delgan commented 4 years ago

I've added a warning to the docs at the end of the paragraph about file sinks. :+1: