Closed Ch41r05 closed 1 year ago
Could you provide a reproducible snippet?
Hi @AmenRa ,
Here is a script that reproduces the error. The logger class only has this problem when SearchEngine().index() is run, if commented the logs are correctly printed.
I may have done something wrong, but I can't seem to find the cause for this since every other test I've run didn't reproduce the issue.
Hi, i removed from commons.logger import Logger
as I do not know what commons
is.
Everything seems to work fine on my end. I get this in both my terminal and the log file:
[2023-09-02 09:16:28,031] {binary_log.py:55} INFO - Building index...
[2023-09-02 09:16:28,303] {binary_log.py:57} INFO - Index built.
@AmenRa I checked and confirmed that even without the import the console log works fine but the log file still has the problem, I tested this script on two different PCs which both run windows. Maybe this could be something related to the operating system?
Also, can I ask you your configuration? Thing like OS, Python version, and the such. Thanks a lot
Hi @AmenRa I did some further testing and I found something interesting. If I change the order of the lines
logger.info("Building index...")
SearchEngine("new-index").index(collection, show_progress=False)
logger.info("Index built.")
to
logger.info("Building index...")
logger.info("Index built.")
SearchEngine("new-index").index(collection, show_progress=False)
the log file becomes empty. So it seems from the tests I conducted that somehow rertiv is still sending something to stdout even when with show_progress=False. If I comment the indexing line, the log correctly is shown in the file. I'm running on windows 11 with python 3.10.0. I don't know if I can give you any more information, but please tell me if you need something else to try to reproduce the error in your environment.
Hi @AmenRa , quick update, I tried changing the logging library with loguru, and the issue still occurs when opening the file in W mode. From further digging, I found that the log file has NUL (yes, with just one "L") values before the last line, this seems to be an issue related to the underlying open file command and multithreading/multiprocessing which, given the high speed of retriv, I guess it's being done under the hood. I'll try to see if I can dig some other information.
Hi, show_progress
is passed to a tqdm
progress bar in this fashion:
tqdm(
...
disable=not show_progress,
)
Try running a loop with tqdm
like this to verify whether tqdm
is (one of) the cause:
from tqdm import tqdm
logger.info("Building index...")
for _ in tqdm(range(10_000), disable=True):
continue
logger.info("Index built.")
Other tqdm
options that I use are:
desc="something"
dynamic_ncols=True
mininterval=0.5
Honestly, I am not aware of other things that may interfere with a logger.
Also, I do not have a Windows machine, so maybe the issue is related to Windows / Windows + tqdm
.
Hi @AmenRa , thanks for your help! I ran the sample code you provided me, sadly (?) everything worked fine, the output was the expected one in both the console and the logfile. Seems tqdm isn't the culprit here. I agree that the problem may lie in the fact that I'm using Windows. As I was saying this kind of behavior seems to be linked on how files are opened under the hood. I'll try to dig deeper and keep you updated if I find anything interesting, since this behavior is strange and not as intended I guess!
Hi @AmenRa, thanks for your support, I finally solved the issue. The problem doesn't lie in retriv per se, but in the fact that's using multiprocessing. The logger was instantiated outside the if name == "main" section of the code, so the multiprocessing executed it more than once, leading to unexpected output. Once the instantiation of the code has been moved under the if, everything works as expected. Leaving this here so other windows users won't be caught in OS this difference.
Hi again,
I'm stuck with a strange behavior, that from my tests seems to be related to the use of the SearchEngine. I'm using a SingletonLogger that logs everything to stdout and persists that log onto a file. When the program runs, the index takes a bit of time to be calculated and if I check the logfile, I can correctly see everything printed until this point. After the SearchEngine finishes calculating the index, the first row of the logfile becomes a series of nul values. Below a sample of code and of the log file. Can anyone give me pointers to solve this?
logfile.log