BLooperZ / nutcracker

Tools for editing resources in SCUMM games.
GNU General Public License v3.0
42 stars 11 forks source link

No proper handling of unicode chars in other languages #8

Closed d0k3 closed 1 year ago

d0k3 commented 2 years ago

This happens when trying to extract the strings from a german version of Day of the Tentacle:

.\nutcracker.exe sputm strings_extract --textfile strings.txt .\TENTACLE.000 [...] Extracting strings from game resources: TENTACLE.000 Traceback (most recent call last): File "nutcracker\runner.py", line 11, in File "typer\main.py", line 214, in call File "click\core.py", line 829, in call File "click\core.py", line 782, in main File "click\core.py", line 1259, in invoke File "click\core.py", line 1259, in invoke File "click\core.py", line 1066, in invoke File "click\core.py", line 610, in invoke File "typer\main.py", line 497, in wrapper File "nutcracker\sputm\runner.py", line 91, in extract_strings File "encodings\cp1252.py", line 19, in encode UnicodeEncodeError: 'charmap' codec can't encode character '\u05d1' in position 84: character maps to [7324] Failed to execute script 'runner' due to unhandled exception!

d0k3 commented 2 years ago

As an additional comment, \u05d1 corresponds to the hebrew letter Bet, which for sure does not turn up even in the german version of Day of the Tentacle. So, the issue may be something bigger.

BLooperZ commented 2 years ago

Thank you, I'm still considering how I would like to address this issue (accept encoding as input, write raw bytes or just ignore encoding errors)

BLooperZ commented 2 years ago

Last update might fix this issue.