Drekin / win-unicode-console

A Python package to enable Unicode support when running Python from Windows console.
MIT License
103 stars 12 forks source link

Using atexit() apparently breaks win-unicode-console #25

Closed DervishD closed 8 years ago

DervishD commented 8 years ago

If win-unicode-console is installed as "Python patch" (in usersitecustomize, I mean) and you run this code on Windows (tested with Python 3.5 x64):

import atexit
atexit.register(lambda: input('Press ENTER to continue...'))
print(__file__)
exit()

then the following error occurs (I'm including full output of the program):

C:\[CENSORED]\test.py
Press ENTER to continue...Intenal win_unicode_console error
Traceback (most recent call last):
  File "C:\Program Files\Python35\lib\site-packages\win_unicode_console\readline_hook.py", line 84, in readline_wrapper
    line = self.readline_hook(prompt)
  File "C:\Program Files\Python35\lib\site-packages\win_unicode_console\readline_hook.py", line 59, in stdio_readline
    return sys.stdin.readline()
  File "C:\Program Files\Python35\lib\site-packages\win_unicode_console\streams.py", line 197, in readline
    return self.base.readline(size)
ValueError: I/O operation on closed file.

That's all. If you need more information or want me to carry any test, just tell.

Thanks a lot for win-unicode-console, it's awesome and it should be included with Python by default on Windows...

Raúl

Drekin commented 8 years ago

Hello, thank you for your support.

Regarding the issue, it has nothing to do with win-unicode-console. Essentially same behaviour occurs without win-unicode-console as well. The point is that calling exit closes sys.stdin, so you get an exception whenever you try to read input from a user afterwards (e.g. in atexit handler). I don't know why exit closes sys.stdin, but there seems to be a reason: https://hg.python.org/cpython/file/default/Lib/_sitebuiltins.py#l23. However, if you use sys.exit instead of exit, it works fine.

DervishD commented 8 years ago

Sorry, Drekin, I'm utterly stupid, I should have tested without win-unicode-console before sending the bug report. And the worst part is that I already knew exit() closes sys.stdin because a similar bug bit me a long time ago...

Problem is, I had forgotten that bug and since the error message came from win-unicode-console and included the string Internal win_unicode_console error, I jumped to conclusions without thinking about it twice...

In my defense, I just wanted to be useful...

BTW, the error message says Intenal instead of Internal ;)

Thanks a lot for the promptly response and of course for win-unicode-console

Drekin commented 8 years ago

It's OK, and that “Intenal” typo actually is a bug. :-)

You are right about the error message, but it is produced in a custom readline hook, which is called by C code, so I cannot easily raise an exception, the partial traceback is actually just printed and I (wrongly in this case) assume that it means that there is a bug in implementation of the readline hook itself.

DervishD commented 8 years ago

I think it's safe to assume it means there's a bug in the implementation rather than adding an exception for this corner case, which is also not very frequent as using exit() in a script is a weird thing to do. In fact I discovered all this because I accidentally deleted the sys. bit in front of the exit(). Plain exit() as defined in site is intended to be used from the interactive loop, so in certain way it makes sense for it to close sys.stdin.

Other than the typo, I haven't found anything wrong in win-unicode-console, it works transparently and perfectly. Now my scripts can use utf-8 safely in the otherwise crappy Windows command prompt. Maybe next century MS will convert it to a full utf-8 terminal emulator...

Drekin commented 8 years ago

Windows command prompt is not that bad. As far as I know, it just doesn't display astral characters properly, and it is not straightforward to use a custom font, but it supports Unicode BMP. Windows uses UTF-16-LE instead of UTF-8, but is that really a problem? It would be enough if programmers used proper abstractions – in Python 3 a string is a sequence of Unicode codepoints, so it is neither UTF-8 nor UTF-16-LE, and the actual low-level representation doesn't matter.

DervishD commented 8 years ago

As far as I know you can't print certain chars from Python3 into Windows command prompt without using win-unicode-console or similar. Just spewing UTF-16-LE bytes using sys.stdout.buffer.write(), for example, doesn't work.

For example, try to print ŝ from a Python script on Windows command prompt without using win-unicode-console. And yes, this is probably Python's fault for not using the WriteConsoleW function, but at least a program should be able to print any char in Windows command prompt by printing the bytes corresponding to the UTF16-LE encoding of that char. For me, it doesn't work, otherwise I wouldn't need win-unicode-console.

BTW, win-unicode-console doesn't support the "buffer" member (instance of io.BufferedIOBase) in sys.stdout! I don't know how widely used is that, but I bet some script uses it, and then IMHO win-unicode-console should provide that member.

anthrotype commented 8 years ago

@DervishD you can always resort to the original sys.__stdout__, sys.__stderr__ and sys.__stdin__ if you want to get the buffer attribute. win-unicode-console is for text rather than binary data.

DervishD commented 8 years ago

I know, Cosimo, and I've never used the buffer myself, but I'm not sure if any of the scripts I may need use it or not.

I wrote a module, some time ago, which redirects sys.stdout and sys.stderr to a Tk widget (I wrote a PyQT5 version, too), and I used sys.__stdout__ and sys.__stderr__ to restore the streams when done.

Thanks anyway :)

anthrotype commented 8 years ago

I knew you knew ;)

Drekin commented 8 years ago

Python just communicates with Windows console using current ANSI codepage. Unfortunatelly, there seems to be no working universal codepage, and it would require to change some Windows setting to use this codepage. So just using the WinAPI -W functions seems to be the right way.

Regarding the buffer attribute, it actually is present on first-choice custom stream objects – streams.std*_text use the strandard Python 3 IO hierarchy. But it naturally uses the encoding used natively – UTF-16-LE. Unfortunatelly, Python tokenizer cannot handle UTF-16-LE because of the nullbytes, so there is another layer – transcoding wrappers streams.std*_text_transcoded that just wrap the first-choice streams so they can have UTF-8 encoding attribute.

On the other hand, the usage of buffer attribute is rather limited – you can use it only if you have a string encoded with the right encoding. If you have a string, then you don't need ˋbuffer, and if you havebytes` in wrong encoding, you have to decode to a string first.

DervishD commented 8 years ago

So it doesn't make much sense adding it to win-unicode-console, right? I don't recall having seen buffer used in any script or module I've ever used, anyway.

Thanks for everything, Adam :)

Drekin commented 8 years ago

It's already there, but not on currently used custom streams. See sys.stdout.base.buffer and the documentation for details. The point is that buffer has to be UTF-16-LE based, and sys.stdout.encoding has to correspond to this encoding, and sys.stdout.encoding and sys.stdin.encoding should be the same, and Python tokenizer cannot handle UTF-16, so all these constraints cannot be satisfied at the same time. Something like my proposal on http://bugs.python.org/issue17620 could resolve this.

You are welcome. :-)

DervishD commented 8 years ago

I actually did read that particular bug report when looking for a solution to print utf-8 in Windows Console Host (that's how I finally found win-unicode-console), and I hope it gets fixed so Python can incorporate your solution on Windows. Not that I don't like win-unicode-console, but it can be complex to install in sitecustomize or usersitecustomize for someone who just wants to play with Python. If the interpreter comes with the solution by default, it's easier.

I have my fingers crossed for that and bug 1602 to be closed... Won't hold my breath, though.