bskinn / stdio-mgr

Context manager for mocking/wrapping stdin/stdout/stderr
MIT License
14 stars 4 forks source link

Replace buffers inside of existing sys.std* wrappers #75

Open jayvdb opened 5 years ago

jayvdb commented 5 years ago

Currently we replace the sys.std* objects with new wrappers.

It might be possible to detach the existing buffers in those wrappers, and attach new ones to the existing wrappers.

That solution is a superset of https://github.com/bskinn/stdio-mgr/issues/74 , making it unnecessary.

This should be easier on py37 with its TextIOWrapper.reconfigure, but would be ok if we can create backports https://github.com/bskinn/stdio-mgr/issues/33

This possibly makes https://github.com/bskinn/stdio-mgr/pull/64 less desirable, as-is.

jayvdb commented 5 years ago

Another approach at https://stackoverflow.com/a/1736047 , and a comment links to https://stackoverflow.com/a/881751 which includes the Windows equivalent.

jayvdb commented 5 years ago

As a bit of a summary of things found so far ... the documented TextIOWrapper has no stream-reconfiguration of the buffer, so according to the docs there is no way to change sys.stdout.buffer while keeping sys.stdout unmodified.

There are benefits of keeping the sys.stdout unmodified, and fiddling with its internals, because:

1 predicting/fixing where sys.stdout was stored before the stdio_mgr context is not possible- any pre-existing references to the original sys.stdout could be used at any time during the context,

  1. the fake sys.stdout could be stored anywhere during the context, and then used again after the context (the logging problem)

It is more feasible to detect (2), but it is a lot of work.

It would be good to add some tests which demo each of these two problems with the current stdio_mgr.


There is a good reason to not fiddle with the insides of sys.stdout if it is the same as sys.__stdout__, because the Python docs say not to touch the later, and by extension they probably also mean the insides of the former shouldnt be touched either.


But, if we do want to fiddle...

The easy case is the linux console io streams (i.e. sys.stdout and sys.stderr). They can be reconfigured with a new underlying buffer of arbitrary buffer type. sys.stdout.buffer.__init__(io.BufferedRandom(io.BytesIO())) is probably enough.

The hard case is the windows console io, which has a few modes, and has been constantly improving, but with backwards compatibility. The sub-case I have been targeting so far is PYTHONUNBUFFERED=1 as that appears similar on Windows and Linux. PYTHONUNBUFFERED=1 is also python -u however the -u can not be easily detected.

PYTHONUNBUFFERED=1 console io buffer is picky. It does not accept a buffer object. It needs a filename or fileno. That means we need to create a buffer which has a fileno(), and TemporaryFile is the most obvious candidate for creating those. However TemporaryFile is a function, not a class, which means if we want to add custom behaviour we need to use __new__ and add an override of FileIO.__init__. The current implementation in the POC PR is horrendous, leaking resources everywhere, only intended to verify it is possible to do it without breaking sys.__stdout__ (but it isnt stress tested yet, and is likely to be breakable with threads).

The main motivation of proceeding with this 'feature', even though it is discouraged by Python devs, is the situation like I had with coala, where implementing the current stdio_mgr was failing due to logging - I could narrow it down and find a way to reset the logging module. Other codebase will have even more strange problems, and they may not be as easily bypassed. It would be helpful to have another mode(s) which can be used, as a way of bisecting the error reports.

bskinn commented 5 years ago

Hmmm... so, the internal noodling could be intended mainly as a debugging tool? Helping us diagnose weird things in different situations?

jayvdb commented 5 years ago

Yes, and help users do the same. While the POC makes them the default implementation, that is mostly so that I can test the branch easily in other repos. That would be removed before being mergable - maybe replaced with envvars to force a potentially bad implementation for debugging/investigation purposes.