Qix- / better-exceptions

Pretty and useful exceptions in Python, automatically.
MIT License
4.6k stars 204 forks source link

Simplify handling of encoding #69

Open Delgan opened 5 years ago

Delgan commented 5 years ago

Hi @Qix-

Taking advantage of the fact that we no longer need to support Python 2.7, I think we can largely simplify how we manage strings encoding.

This avoids the use of hacky ProxyBufferStreamWrapper class, the dubious to_unicode() and to_bytes() functions, and the conditional statement in write_stream().

Basically, we just format the exception as an unicode string, and we let the sys.stderr stream handles encoding, no need to deal with .buffer bytes.

You may notice one side-effect: utf-8 characters like "天" are no longer displayed as it on ascii terminals. I think this is actually the correct way to do it.

One thing I don't understand with the current implementation is that in one hand we test encoding of and fallback to -> on error, on the other hand we manage to print in all cases. This results in traceback which may look like -> "天". It's paradoxical, either we can display utf8 characters or we can't. I suspect that this is a source of errors for the problems encountered by some users.

By writing not encoded unicode to sys.stderr, the unprintable characters are automatically escapted with the surrogateescape policy, and hence displays -> "\u5929" on ascii terminals, └ "天" otherwise.

Also, I replaced sys.getpreferredencoding() with STREAM.encoding, because we are writing to STREAM (sys.stderr) so why not use its specified encoding? Using sys.getpreferredencoding() proved to display mojibake characters to some users, so maybe this will fix it.

I made some tests on both Linux and Windows and compared exception formatting between standard and better_exceptions based on locale and IO encoding. The handling of utf8 characters is now identitical to what is done by the default exception handler, so I think this should reduce problems due to encoding.

This pull request is made for the python3_only branch.

Delgan commented 5 years ago

@Qix- I realized there is actually a problem with this solution.

Given this code runing on ascii terminal:

a = "天"
"天" * a

Without better_exceptions:

Traceback (most recent call last):
  File "a.py", line 6, in <module>
    "\u5929" * a
TypeError: can't multiply sequence by non-int of type 'str'

With better_exeptions/master:

Traceback (most recent call last):
  File "a.py", line 6, in <module>
    '天' * a
            -> '天'
TypeError: can't multiply sequence by non-int of type 'str'

With better_exceptions/simplify_encoding:

Traceback (most recent call last):
  File "a.py", line 6, in <module>
    '\u5929' * a
            -> '\u5929'
TypeError: can't multiply sequence by non-int of type 'str

The column where to start -> is wrongly computed as it is calculated from the non-encoded source string. I have a solution but can't really fix it here because of others problem with source formatting. I made a branch based on this one which fixes source formatting and where this can be easily fixed.