DeepBlueCLtd / LegacyMan

Legacy content for Field Service Manual
https://deepbluecltd.github.io/LegacyMan/index.html
Apache License 2.0
2 stars 0 forks source link

Content checker falling over when it encounters a `.dita` file with unexpected characters #597

Closed IanMayo closed 9 months ago

IanMayo commented 9 months ago

The html source files contain rare use of ï and quite common use of ä (94 times across 31 files)

These get converted to DITA ok, and then to HTML satisfactorily.

But on MS-Windows the content-checker falls over when trying to load a .dita file containing one of these characters.

The error occurs in the target_path.read_text() call in check_files.py

UnicodeDecodeError: `charmap` code can't decode byte 0x8f in position 154: character maps to <undefined>

Sadly, I don't-think the error is present on macos - I tried to do a content-check for all files, and it didn't fall over. So, I guess it's MS-Win only.

Test data in #596

robintw commented 9 months ago

Can you do an in-place modification to that method call and see if it fixes it? Obviously there's no point me trying here on Mac.

Replace:

target_path.read_text()

with

target_path.read_text(encoding='utf8')

If that doesn't work, you can try the same but with encoding='cp1252'.

IanMayo commented 9 months ago

First one fixed it :-D

robintw commented 9 months ago

Excellent, are you happy to do a PR?