Closed atemerev closed 3 months ago
Logfiles from GitLab pipeline #202325 (:no_entry:) have been uploaded here!
Status and direct links:
I agree that the best would be to enforce utf8, however I tried your solution and didnt work. I have tried this with better luck (in the logging.py, before creating the StreamHandler)
# Ensure sys.stdout uses UTF-8 encoding
if hasattr(sys.stdout, 'reconfigure'):
sys.stdout.reconfigure(encoding='utf-8')
I think we tried that in the past and it failed when streaming those chars via slurm. At least in one place I believe I changed one such char to a plain ascii char. If solution 2 can't work easily I'd propose we go with 1. (Maybew e can use some an extended ascii char (128-255?)
Logfiles from GitLab pipeline #204101 (:no_entry:) have been uploaded here!
Status and direct links:
@atemerev , to solve this issue quickly, shall we replace the special character ╚, to a more simple one ?
Logfiles from GitLab pipeline #212672 (:no_entry:) have been uploaded here!
Status and direct links:
Logfiles from GitLab pipeline #212676 (:no_entry:) have been uploaded here!
Status and direct links:
Logfiles from GitLab pipeline #213113 (:white_check_mark:) have been uploaded here!
Status and direct links:
Context
In timeit.py logging, a Unicode symbol \u255a (╚ ) is used to draw the tree hierarchy of time intervals measured. For logging in Python, 'locale' encoding is used by default. If the terminal locale is set to anything except UTF-8 (e.g. 'latin-1'), Python's output codec cannot encode the symbol, and fails with UnicodeEncodeError exception.
Scope
The following solutions were considered:
1) Use an ASCII symbol instead of ╚, so it could work everywhere.
The alternatives are "+", "=", and "L", but everything looks sort of ugly. We can also draw multi-line trees with |, + and -, but it takes a lot of more space and looks sort of ugly as well.
2) Force the UTF-8 encoded output.
Most terminals everywhere now support UTF-8. If they are misconfigured to other locale like latin-1, or genuinely do not support UTF-8, the symbol will be rendered as '?', which is ugly, but readable, and it does not mangles the output.
3) Read the locale-default encoding, and use an ASCII-based symbol as a separator if it is not UTF-8.
Output encoding determination can be unreliable, and it is prone to misconfiguration.
I propose option 2.
Testing
As it requires interaction with the terminal, I think manual testing is reasonable here.
Review