danijar / handout

Turn Python scripts into handouts with Markdown and figures
Apache License 2.0
2.02k stars 106 forks source link

Duplicated logging messages when importing module twice #11

Closed epogrebnyak closed 5 years ago

epogrebnyak commented 5 years ago

I run example.py in IPython console in Spyder and it seems that console output gets proportional to the number of times I have run the script. On the first run in new kernel I get just single message, one per call of handout.Handout instance method.

This is what I get on the fifth run:

Iteration 0
Iteration 0
Iteration 0
Iteration 0
Iteration 0
Iteration 0
Iteration 1
Iteration 1
Iteration 1
Iteration 1
Iteration 1
Iteration 1
Iteration 2
Iteration 2
Iteration 2
Iteration 2
Iteration 2
Iteration 2
Handout written to: output\index.html
Handout written to: output\index.html
Handout written to: output\index.html
Handout written to: output\index.html
Handout written to: output\index.html
Handout written to: output\index.html
Saved figure: output\figure-0.png
Saved figure: output\figure-0.png
Saved figure: output\figure-0.png
Saved figure: output\figure-0.png
Saved figure: output\figure-0.png
Saved figure: output\figure-0.png
Handout written to: output\index.html
Handout written to: output\index.html
Handout written to: output\index.html
Handout written to: output\index.html
Handout written to: output\index.html
Handout written to: output\index.html
Saved figure: output\figure-1.png
Saved figure: output\figure-1.png
Saved figure: output\figure-1.png
Saved figure: output\figure-1.png
Saved figure: output\figure-1.png
Saved figure: output\figure-1.png
Saved figure: output\figure-2.png
Saved figure: output\figure-2.png
Saved figure: output\figure-2.png
Saved figure: output\figure-2.png
Saved figure: output\figure-2.png
Saved figure: output\figure-2.png
Saved figure: output\figure-3.png
Saved figure: output\figure-3.png
Saved figure: output\figure-3.png
Saved figure: output\figure-3.png
Saved figure: output\figure-3.png
Saved figure: output\figure-3.png
Handout written to: output\index.html
Handout written to: output\index.html
Handout written to: output\index.html
Handout written to: output\index.html
Handout written to: output\index.html
Handout written to: output\index.html

Maybe not a bug, but it would have felt more confortable the handout.Handout instance made just a single print to the screen per call.

I run the script as runfile('D:/github/handout/example.py', wdir='D:/github/handout')

danijar commented 5 years ago

Handout uses a separate logger instance, so you can do this:

import handout
import logging
logging.getLogger('handout').setLevel(logging.ERROR)

I'm also open to changing which messages are printed by default.

For example, I think doc.add_text() should always print so you can use it to replace print() where needed. Besides this, the path to the index.html could be printed once in the beginning to open the handout in the browser. All other messages could be hidden by default (e.g. log level info).

Do you have thoughts on this?

danijar commented 5 years ago

Taking a second look at what you say, it does seem incorrect that the same figure is added multiple times. Have you tried just running your script as python3 script.py?

It's not entirely clear to me how Python Handout would be supposed to behave in an interactive console. It goes in a different direction than what motivated the library, i.e. to create reports in a reproducible environment. But I'd be happy to talk through how this could work and if we can extend it in a reasonable way.

epogrebnyak commented 5 years ago

python example.py produces single output as inteded.

Changing the logging as suggested above did not help in IPython interactive session, each runfile() adds up to more lines of output. There is a state somewhere inside the interactive session. :)

epogrebnyak commented 5 years ago

Besides this, the path to the index.html could be printed once in the beginning to open the handout in the browser.

Maybe a path should be printed just for show() method? It would be a clutter to see paths for add_text(). In my opinion add_text() should behave like print().

danijar commented 5 years ago

Re amount of logging messages, I reopened #2 to track this for the next release.

Re executing from within Spyder, could you describe the setup a bit more, please? I assume runfile() imports the script and executes it inline so that its variables become available in the interactive environment? Does the generated handout look as you would expect?

There is no global state in Handout, so as long as you create a new handout.Handout() instance every time you generate a report, I'm not sure where the accumulating state comes from.

epogrebnyak commented 5 years ago

There is no global state in Handout indeed - but there is seemingly is some state in IPython logger configuration - it accumulates messages (maybe it is a desired behaviour for a REPL).

Minimal example:

import handout
doc = handout.Handout('output')
doc.add_text('Print this text')

Running once on new kernel in IPython (same output for python minimal.py):

[1]: runfile('D:/github/handout/minimal.py', wdir='D:/github/handout')
Print this text

But on next runs I get multiple logger messages, one more message per run:

[2]: runfile('D:/github/handout/minimal.py', wdir='D:/github/handout')
Reloaded modules: handout, handout.handout, handout.blocks, handout.tools
Print this text
Print this text

[3]: runfile('D:/github/handout/minimal.py', wdir='D:/github/handout')
Reloaded modules: handout, handout.handout, handout.blocks, handout.tools
Print this text
Print this text
Print this text

P.S. This is not a terrible thing, but rather a nuisance.

epogrebnyak commented 5 years ago

Maybe I can fill a question on Stack Overflow to ask for clues.

danijar commented 5 years ago

Thanks a lot for investigating! I'm quite surprised how the resulting handout looks like. Does it show the code lines that were executed in the REPL and in the correct order?

epogrebnyak commented 5 years ago

I should have mentioned the html output is normal (the 'handout'), the duplication of messages appears only in the IPython console.

One more line in the example:

import handout
doc = handout.Handout('output')
doc.add_text('Print this text')
doc.show() 

On the first run the console output is:

Print this text
Handout written to: output\index.html

On the second:

Print this text
Print this text
Handout written to: output\index.html
Handout written to: output\index.html

Both print() and logger messages get accumulated in the console.

danijar commented 5 years ago

Thanks. I was actually curious how the resulting handout looks like, because there is no source file when running handout from the interactive interpreter.

Besides this, would it be fair to say that this is a problem with IPython (and we can notify them about it) or is there something we can and should do on our end?

epogrebnyak commented 5 years ago

no source file when running handout from the interactive interpreter

There is the source when running with runfile('D:/github/handout/minimal.py', wdir='D:/github/handout') command. At least something with oriignal file content is executed.

I filled an issue with IPython, maybe someone can shed some light on it.

The issue is definitely on IPython side, but maybe we can do some tricky configuration of logger at init to prevent message duplication.

epogrebnyak commented 5 years ago

The best of my understanding is that a logger object a session-level singleton, that exists in IPython session. Any time the script is run in the session again the getLogger('handout') refers to existing instance.

Any time there is a new call _logger.info(message) we somehow create not just a logging event, but also a steam 'handler', that gets accumulated inside _logger.handlers.

Would need a bit of more research to find out what to do about it (in principle - randomise creation of a logger name or cut out adding more handlers to the logger).

danijar commented 5 years ago

This reminds me a bit of duplicate logging messages I've seen in TensorFlow. What do you think about getting rid of the logger and just using print() statements together with a quite=False option in the constructor?

epogrebnyak commented 5 years ago

Let me bear with this issue for a little while - it appears the problem is not even IPython, but it is specific to Spyder. I came up to running the script in IPython session in console and the messages do not duplicate:

In [1]: %run minimal.py
Print this text

In [2]: %run minimal.py
Print this text

Most Stack Overflow answers like this one deal with a situation where there is addHandler() call in script code and it gets repeated when script is called in the same session. Also there is a solution to set logger.propagate = False, but injecting it in Handout code had no effect.

The wierd point is to detect where the handlers get added in between the calls to the script. I would like to investigate a bit, if no result, can possibly use print. Maybe wrap print in own helper:

def log(message, logger=None):
    ```Logging utility to print *message* to console.```
    if logger:
        logger.info(message)
    else:
        print(message, silent=False) 

Some people may see logging vs printing as a matter of code style, so the decision will be up to you as an author and maintainer of the project.

epogrebnyak commented 5 years ago

God, @danijar! ))) you are actually adding a new handler in __init__.py!

Maybe this code should be inside handout.py or tools.py or new logging.py - __init__.py is the last place I looked for some configuration interference code, as I searched codebase of Spyder and IPython instead. ))

https://github.com/danijar/handout/blob/master/handout/__init__.py#L1-L13

import sys
import logging

from .handout import Handout

# Set up logger.
logger = logging.getLogger('handout')
logger.setLevel(logging.INFO)
logger.propagate = False  # Global logger should not print messages again.
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(logging.Formatter('%(message)s'))
logger.addHandler(handler)

What happens in that IPython resets logger handlers after %run command, while Spyder keeps the handlers alive. The reason for this is some ambiguity in logging.shutdown() behaviour.

As a quick fix, we can do something like below for __init__py:

import sys
import logging

from .handout import Handout

# Set up logger.
logger = logging.getLogger('handout')
logger.setLevel(logging.INFO)
logger.propagate = False  # Global logger should not print messages again.
if not logger.handlers:
    handler = logging.StreamHandler(sys.stdout)
    handler.setFormatter(logging.Formatter('%(message)s'))
    logger.addHandler(handler)
epogrebnyak commented 5 years ago

The fix in __init__.py really works, but I suggest making things more explicit via create_logger() function. create_logger() may be placed in tools.py, for example

import sys
import logging

def create_logger():
    logger = logging.getLogger('handout')
    logger.setLevel(logging.INFO)
    logger.propagate = False  # Global logger should not print messages again.
    if not logger.handlers:
        handler = logging.StreamHandler(sys.stdout)
        handler.setFormatter(logging.Formatter('%(message)s'))
        logger.addHandler(handler)
    # The logger can be access anywhere in code by its name 
    # with logging.getLogger('handout') but we pass it explcitly
    # as function result
    return logger

later in Handout constructior:

self._logger = create_logger()
danijar commented 5 years ago

Thanks for investigating! Yes, we shouldn't keep adding handlers :D

The reason to create it in __init__.py is to make it available once the library is imported so the user can change the logging level, format, etc. So let's include your first solution for now? If you have a better solution, just let me know.