This is a first draft to fix the encoding errors described in #55 - i.e. we're on Windows and the input file is encoded in UTF-8.
The role of PYTHONIOENCODING, as described in the issue
The bug report mentions that even when the PYTHONIOENCODING env var is set, the preferred encoding determined by Python on Windows is still a Windows-specific encoding.
However, it seems that this is actually pretty much the expected behaviour, as the behaviour of this env var seems to only affect stdin/stdout/stderr?
Potential breaking changes brought by this PR
Of course, assuming that the input file is always encoded in UTF-8, like we do with this PR, could break some existing usages of Rich-CLI. Especially on Windows, where UTF-8 is still not the default encoding if I'm not wrong?
Not sure what would be the safest way to handle that issue? 🤔
Potential ways to handler that better
Maybe we could try to use the file using the system's default encoding first, and then, only if that failed, fall back to UTF-8?
e.g. something like this: (pseudo-code)
for encoding in (None, "utf8"):
try:
with open(path, "rt", encoding=encoding) as resource_file:
except EncodingError:
continue
As pointed out by @darrenburns , there is now the possibility to use a PYTHONUTF8 env var, which seems to work:
Add a flag and/or an env var specific to Rich-CLI to let the user tell Rich-CLI which encoding we should use to open the input file?
Maybe that would be the most flexible option, combined to a "before raising an exception, fall back to UTF-8 if the default encoding didn't work" strategy? What do you think @willmcgugan @darrenburns ? :slightly_smiling_face:
This is a first draft to fix the encoding errors described in #55 - i.e. we're on Windows and the input file is encoded in UTF-8.
The role of PYTHONIOENCODING, as described in the issue
The bug report mentions that even when the PYTHONIOENCODING env var is set, the preferred encoding determined by Python on Windows is still a Windows-specific encoding. However, it seems that this is actually pretty much the expected behaviour, as the behaviour of this env var seems to only affect stdin/stdout/stderr?![PYTHONIOENCODING-not-changing-preferred-encoding](https://user-images.githubusercontent.com/722388/173562913-792ff701-1321-4ea8-b418-5ae097b1b425.png)
Potential breaking changes brought by this PR
Of course, assuming that the input file is always encoded in UTF-8, like we do with this PR, could break some existing usages of Rich-CLI. Especially on Windows, where UTF-8 is still not the default encoding if I'm not wrong? Not sure what would be the safest way to handle that issue? 🤔
Potential ways to handler that better
Maybe we could try to use the file using the system's default encoding first, and then, only if that failed, fall back to UTF-8? e.g. something like this: (pseudo-code)
As pointed out by @darrenburns , there is now the possibility to use a PYTHONUTF8 env var, which seems to work:![using-PYTHONUTF8](https://user-images.githubusercontent.com/722388/173569356-7f3a7f3e-0aeb-4116-9f68-b17cf0300570.png)
Add a flag and/or an env var specific to Rich-CLI to let the user tell Rich-CLI which encoding we should use to open the input file?
Maybe that would be the most flexible option, combined to a "before raising an exception, fall back to UTF-8 if the default encoding didn't work" strategy? What do you think @willmcgugan @darrenburns ? :slightly_smiling_face:
Before / After
Before this fix:
After this fix:
fixes #55