Szwendacz99 / BookStack-Python-exporter

Customizable script for exporting notes from BookStack through API. Export Pages, Chapters, Books, attachments and images.
MIT License
23 stars 3 forks source link

Encoding issue when titles have non standard characters #9

Closed arnauos closed 3 months ago

arnauos commented 3 months ago

Hi,

When I have books/chapters/pages that have non standard chars on the title (for example: "àáèéìíòóùú") the export crashed with an encoding error as shown below. As a workaround I'm removing said chars but is time consuming when working together with other people.

Title in this example was "Información" and "ó" was encoded to "\xf3".

The issue is mainly occurring when the title with the characters has to go on the exported filename. For example, if I export as BOOKS (having no book with non standard characters), page titles having said characters don't produce the error, but if I then export the same but as PAGES then the error is triggered.

I'm sure it's probably not a difficult fix but I'm not experienced enough in pyhton to do it myself (tried without luck!). Any help would be greatly appreciated.

DEBUG :: Page: "Informaci\xf3n", ID: 1062, last edit: 2024-07-10 16:41:04
DEBUG :: Checking for update for file /opt/export/./shelve2/book1/Informaci\xf3n.pdf
Traceback (most recent call last):
  File "exporter.py", line 551, in <module>
    export_doc(files, lvl)
  File "exporter.py", line 376, in export_doc
    if not check_if_update_needed(path, document):
  File "exporter.py", line 344, in check_if_update_needed
    if not os.path.exists(file_path):
  File "/usr/lib64/python3.6/genericpath.py", line 19, in exists
    os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode character '\xf3' in position 71: ordinal not in range(128)

(note that the line numbers may differ as I did modify the file to ignore ssl errors)

Szwendacz99 commented 3 months ago

This is somewhat weird encoding problem, as it happens with file path encoding.

Could you check what encoding is used by default by your python? To see it, launch python interactively (command python) and then run the code below:

import sys
print(sys.getfilesystemencoding())

It should look like that:

> python
Python 3.12.4 (main, Jun  7 2024, 00:00:00) [GCC 14.1.1 20240607 (Red Hat 14.1.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.getfilesystemencoding())
utf-8
>>>
Szwendacz99 commented 3 months ago

Also remember that for really problematic characters in filenames (like / which is impossible to use on Linux), you can use -c parameter to replace those characters with _ char. But here it should not be necessary.

arnauos commented 3 months ago

Yep, that seems to be the issue, file system encoding seems to be in ascii while default python3 encoding is set as utf-8.

Not really sure how to work around this, exclude these chars using -c does not work.

Szwendacz99 commented 3 months ago

Yeah, now when I get closer look, I see that the exception happend not when trying to open and save file, but before that, when os.stat(path) happens. This suggests that proper fix will not be inside the script.

I was able to reproduce this error using container with python 3.6, where after setting variables export LC_CTYPE=en_NZ; export LANG=en_NZ; the sys.getfilesystemencoding() was returnin ascii, but with latest python which is 3.12 on my system, it was rather returning iso8859-1 which is extended version of ascii, and there charactesrs àáèéìíòóùú were working. Currently the default for LANG on modern Linuxes is en_US.UTF-8, and with that you should get full utf-8 support here.

You probably should try fixing the system locale, and/or upgrade python. You could try setting variables before launching python:

export LC_CTYPE=en_US.UTF-8; 
export LANG=en_US.UTF-8;
python;

to see if problem is fixed. It might not be if those locales are not installed in your system.

arnauos commented 3 months ago

I'm currently locked @ python 3.6. Now that issue is clear I'm going to spin up a docker container with python to run the export process without the system limitations 👍

Thank you very much for the support, really helped me pointing to the right path!

Szwendacz99 commented 3 months ago

Yes, containers are a good way. I will close for now, feel free to reopen, if encoding issues persist.