Stdlib logging references sys.error on import

jayvdb commented 5 years ago

As special case of https://github.com/bskinn/stdio-mgr/issues/71 , and likely to be one of several cases, stdlib logging creates a reference to sys.error on import. As a result the following fairly common scenario of using a main() to test a cli without subprocesses fails if logging is first imported within the context:

from stdio_mgr import StdioManager

def invoke_cli_main():
    import logging
    logging.basicConfig()
    ...

def test_cli():
    with StdioManager():
        invoke_cli_main()
    import logging
    logging.warning('BOOM')

>>> test_cli()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.7/logging/__init__.py", line 1037, in emit
    stream.write(msg + self.terminator)
  File "/home/jayvdb/projects/python/stdio-mgr/src/stdio_mgr/stdio_mgr.py", line 90, in write
    super().write(*args, **kwargs)
ValueError: I/O operation on closed file.
Call stack:
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in test_cli
Message: 'BOOM'
Arguments: ()

After the context handler exits, the wrapper streams are closed, and the logging is still using them.

Easy solution: stdio_mgr imports logging, like we would need to for https://github.com/bskinn/stdio-mgr/issues/43

Even the following doesnt dislodge the problem:

def invoke_cli_main():
    import logging
    logging.basicConfig()
    ...
    from logging.config import _clearExistingHandlers
    _clearExistingHandlers()

To clear it properly we need to do:

from stdio_mgr import StdioManager

def invoke_cli_main():
    import logging
    logging.basicConfig()
    ...
    from logging.config import _clearExistingHandlers
    _clearExistingHandlers()
    logging.Logger.root = logging.root = logging.RootLogger(logging.WARNING)
    logging.Logger.manager = logging.Manager(logging.Logger.root)

def test_cli():
    with StdioManager():
        invoke_cli_main()
    import logging
    logging.info('OK')

test_cli()

or the following also seems to work

def invoke_cli_main():
    import logging
    logging.basicConfig()
    ...
    from logging.config import _clearExistingHandlers
    _clearExistingHandlers()
    logging.Logger.manager._clear_cache()

We can hide that logging cleanup inside of StdioManager. It also works when import colorama is added to that mixture, but I do not feel comfortable that it is solved.

Thankfully there doesnt seem to be too many references to sys.std* in the stdlib, so we might be able to check and workaround them all, whenever feasible and appropriate.

bskinn commented 5 years ago

Scattershot of questions:

What about applications that use other than the .basicConfig() logging setup? Does it still misbehave? Will this case need to be handled separately?
I anticipate using non-.basicConfig() logging for #43 -- does this change how this needs to be handled?
Hopefully a grep -rE 'sys[.]std(in|out|err)' *.py in the CPython repo would suffice to identify all such stdlib references to the streams?
```
$ grep -rE 'sys[.]std(in|out|err)' lib/* | grep -v '/test/' | wc -l
400
```
To what extent do we have to worry about interference from the C portions of CPython ?

jayvdb commented 5 years ago

shutdown snippet for py34-37

            try:
                logging.config._clearExistingHandlers()
            except AttributeError:  # pragma Python 3.6,3.7: no cover
                logging._handlers.clear()
                logging.shutdown(logging._handlerList[:])
                del logging._handlerList[:]

            try:
                logging.Logger.manager._clear_cache()
            except AttributeError:  # pragma Python 3.7: no cover
                logging.Logger.root = logging.root = logging.RootLogger(
                    logging.WARNING)
                logging.Logger.manager = logging.Manager(logging.Logger.root)

jayvdb commented 5 years ago

Any import logging causes the problem, as the references to our fake streams is done during module loading.

If we import logging first those references will be to the 'real' streams, or at least the streams as they existed when stdio_mgr was imported. That is the easy fix, but only for the one specific problem.

The "400" isnt so scary as many of them are only using the streams. The problem is stored references to the streams, especially references created a module import (class creation) phase. This includes any time sys.std* are used as an argument default as those defaults are in the class object.

I havent scanned the C portion, so I dont know the size of the problem there.

Rather than finding them all, which would still only encompass the stdlib, I think we need to first focus on detecting the problem (and issuing warnings, or raising exceptions, etc).

One way that might work is for the streams pushed into sys to be reference counted, e.g. using weakref's or sys.getrefcount , and the __exit__ complain bitterly if any of those streams still have living objects referring to them.

jayvdb commented 5 years ago

Technically this is post-existing stream states in the case of pytest + logging because pytest doesnt import logging. But a different test runner would import logging, and a whole new slew of problems would be caused ...

bskinn commented 5 years ago

One way that might work is for the streams pushed into sys to be reference counted, e.g. using weakref 's or sys.getrefcount , and the __exit__ complain bitterly if any of those streams still have living objects referring to them.

If my mental model of Python name assignment is correct, I suspect there's not any way to invisibly "intercept" assignments to our mocked objects, such that we could track such assignments and then silently "re-target" the assignments once we're ready to __exit__, or whatever?

The GC has to know both ends of assignments in order to detect cycles...could we hijack that machinery somehow? [...looks...] Looks like GC.get_referrers might help, though it specifically says not to use it for production code. :-P

bskinn commented 4 years ago

If my mental model of Python name assignment is correct, I suspect there's not any way to invisibly "intercept" assignments to our mocked objects, such that we could track such assignments and then silently "re-target" the assignments once we're ready to __exit__, or whatever?

But -- what if we added a toggle flag to our object, default to False, that when switched to True turns it into a silent, complete pass-through to the .prior_stdfoo? I don't think we can reach out to other code and rebind their references, but if we can just switch our objects, which those references remain bound to, to a no-op pass-through to the old objects... might work ok?

Could foul some is tests in that external code... but having is be False is probably good... we could try to set it up so that == is True but is is False? Gut-feel, that seems like a sensible way for it to work.

bskinn / stdio-mgr

Stdlib logging references sys.error on import #73