[BBPBGLIB-739] Override UTF-8 for logging

atemerev commented 5 months ago

Context

In timeit.py logging, a Unicode symbol \u255a (╚ ) is used to draw the tree hierarchy of time intervals measured. For logging in Python, 'locale' encoding is used by default. If the terminal locale is set to anything except UTF-8 (e.g. 'latin-1'), Python's output codec cannot encode the symbol, and fails with UnicodeEncodeError exception.

Scope

The following solutions were considered:

1) Use an ASCII symbol instead of ╚, so it could work everywhere.

The alternatives are "+", "=", and "L", but everything looks sort of ugly. We can also draw multi-line trees with |, + and -, but it takes a lot of more space and looks sort of ugly as well.

2) Force the UTF-8 encoded output.

Most terminals everywhere now support UTF-8. If they are misconfigured to other locale like latin-1, or genuinely do not support UTF-8, the symbol will be rendered as '?', which is ugly, but readable, and it does not mangles the output.

3) Read the locale-default encoding, and use an ASCII-based symbol as a separator if it is not UTF-8.

Output encoding determination can be unreliable, and it is prone to misconfiguration.

I propose option 2.

Testing

As it requires interaction with the terminal, I think manual testing is reasonable here.

Review

[X] PR description is complete
[X] Coding style (imports, function length, New functions, classes or files) are good
[N/A] Unit/Scientific test added
[X] Updated Readme, in-code, developer documentation

bbpbuildbot commented 5 months ago

Logfiles from GitLab pipeline #202325 (:no_entry:) have been uploaded here!

Status and direct links:

jorblancoa commented 5 months ago

I agree that the best would be to enforce utf8, however I tried your solution and didnt work. I have tried this with better luck (in the logging.py, before creating the StreamHandler)

# Ensure sys.stdout uses UTF-8 encoding
if hasattr(sys.stdout, 'reconfigure'):
    sys.stdout.reconfigure(encoding='utf-8')

ferdonline commented 5 months ago

I think we tried that in the past and it failed when streaming those chars via slurm. At least in one place I believe I changed one such char to a plain ascii char. If solution 2 can't work easily I'd propose we go with 1. (Maybew e can use some an extended ascii char (128-255?)

bbpbuildbot commented 5 months ago

Logfiles from GitLab pipeline #204101 (:no_entry:) have been uploaded here!

Status and direct links:

WeinaJi commented 4 months ago

@atemerev , to solve this issue quickly, shall we replace the special character ╚, to a more simple one ?

bbpbuildbot commented 3 months ago

Logfiles from GitLab pipeline #212672 (:no_entry:) have been uploaded here!

Status and direct links:

bbpbuildbot commented 3 months ago

Logfiles from GitLab pipeline #212676 (:no_entry:) have been uploaded here!

Status and direct links:

bbpbuildbot commented 3 months ago

Logfiles from GitLab pipeline #213113 (:white_check_mark:) have been uploaded here!

Status and direct links:

BlueBrain / neurodamus