Closed arnauos closed 3 months ago
This is somewhat weird encoding problem, as it happens with file path encoding.
Could you check what encoding is used by default by your python? To see it, launch python interactively (command python
) and then run the code below:
import sys
print(sys.getfilesystemencoding())
It should look like that:
> python
Python 3.12.4 (main, Jun 7 2024, 00:00:00) [GCC 14.1.1 20240607 (Red Hat 14.1.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.getfilesystemencoding())
utf-8
>>>
Also remember that for really problematic characters in filenames (like /
which is impossible to use on Linux), you can use -c
parameter to replace those characters with _
char. But here it should not be necessary.
Yep, that seems to be the issue, file system encoding seems to be in ascii while default python3 encoding is set as utf-8.
Not really sure how to work around this, exclude these chars using -c does not work.
Yeah, now when I get closer look, I see that the exception happend not when trying to open and save file, but before that, when os.stat(path)
happens.
This suggests that proper fix will not be inside the script.
I was able to reproduce this error using container with python 3.6, where after setting variables export LC_CTYPE=en_NZ; export LANG=en_NZ;
the sys.getfilesystemencoding()
was returnin ascii
, but with latest python which is 3.12 on my system, it was rather returning iso8859-1
which is extended version of ascii, and there charactesrs àáèéìíòóùú
were working. Currently the default for LANG on modern Linuxes is en_US.UTF-8
, and with that you should get full utf-8
support here.
You probably should try fixing the system locale, and/or upgrade python. You could try setting variables before launching python:
export LC_CTYPE=en_US.UTF-8;
export LANG=en_US.UTF-8;
python;
to see if problem is fixed. It might not be if those locales are not installed in your system.
I'm currently locked @ python 3.6. Now that issue is clear I'm going to spin up a docker container with python to run the export process without the system limitations 👍
Thank you very much for the support, really helped me pointing to the right path!
Yes, containers are a good way. I will close for now, feel free to reopen, if encoding issues persist.
Hi,
When I have books/chapters/pages that have non standard chars on the title (for example: "àáèéìíòóùú") the export crashed with an encoding error as shown below. As a workaround I'm removing said chars but is time consuming when working together with other people.
Title in this example was "Información" and "ó" was encoded to "\xf3".
The issue is mainly occurring when the title with the characters has to go on the exported filename. For example, if I export as BOOKS (having no book with non standard characters), page titles having said characters don't produce the error, but if I then export the same but as PAGES then the error is triggered.
I'm sure it's probably not a difficult fix but I'm not experienced enough in pyhton to do it myself (tried without luck!). Any help would be greatly appreciated.
(note that the line numbers may differ as I did modify the file to ignore ssl errors)