Open Delgan opened 5 years ago
@Qix- I realized there is actually a problem with this solution.
Given this code runing on ascii
terminal:
a = "天"
"天" * a
Without better_exceptions
:
Traceback (most recent call last):
File "a.py", line 6, in <module>
"\u5929" * a
TypeError: can't multiply sequence by non-int of type 'str'
With better_exeptions/master
:
Traceback (most recent call last):
File "a.py", line 6, in <module>
'天' * a
-> '天'
TypeError: can't multiply sequence by non-int of type 'str'
With better_exceptions/simplify_encoding
:
Traceback (most recent call last):
File "a.py", line 6, in <module>
'\u5929' * a
-> '\u5929'
TypeError: can't multiply sequence by non-int of type 'str
The column where to start ->
is wrongly computed as it is calculated from the non-encoded source string. I have a solution but can't really fix it here because of others problem with source formatting. I made a branch based on this one which fixes source formatting and where this can be easily fixed.
Hi @Qix-
Taking advantage of the fact that we no longer need to support Python 2.7, I think we can largely simplify how we manage strings encoding.
This avoids the use of hacky
ProxyBufferStreamWrapper
class, the dubiousto_unicode()
andto_bytes()
functions, and the conditional statement inwrite_stream()
.Basically, we just format the exception as an unicode string, and we let the
sys.stderr
stream handles encoding, no need to deal with.buffer
bytes.You may notice one side-effect: utf-8 characters like
"天"
are no longer displayed as it on ascii terminals. I think this is actually the correct way to do it.One thing I don't understand with the current implementation is that in one hand we test encoding of
└
and fallback to->
on error, on the other hand we manage to print天
in all cases. This results in traceback which may look like-> "天"
. It's paradoxical, either we can display utf8 characters or we can't. I suspect that this is a source of errors for the problems encountered by some users.By writing not encoded unicode to
sys.stderr
, the unprintable characters are automatically escapted with thesurrogateescape
policy, and hence displays-> "\u5929"
on ascii terminals,└ "天"
otherwise.Also, I replaced
sys.getpreferredencoding()
withSTREAM.encoding
, because we are writing toSTREAM
(sys.stderr
) so why not use its specified encoding? Usingsys.getpreferredencoding()
proved to display mojibake characters to some users, so maybe this will fix it.I made some tests on both Linux and Windows and compared exception formatting between standard and
better_exceptions
based on locale and IO encoding. The handling of utf8 characters is now identitical to what is done by the default exception handler, so I think this should reduce problems due to encoding.This pull request is made for the
python3_only
branch.