garcia / simfile

A modern simfile parsing & editing library for Python 3
MIT License
62 stars 7 forks source link

Non-strict parsing doesn't permit Unicode decode errors #45

Open florczakraf opened 5 months ago

florczakraf commented 5 months ago

Philosophy of Dear World from Anime Extravaganza 3 has some random(?) bytes among properties which prevents proper decoding by the simfile library.

.sm excerpt:

#OFFSET:-0.681;
#BPMS:0.000=174.000;
#A>^W^H0$<ED>^C<A0><9D>x:;
#BGCHANGES:;
#FGCHANGES:;

hexdump output:

00000160  53 45 54 3a 2d 30 2e 36  38 31 3b 0a 23 42 50 4d  |SET:-0.681;.#BPM|
00000170  53 3a 30 2e 30 30 30 3d  31 37 34 2e 30 30 30 3b  |S:0.000=174.000;|
00000180  0a 23 41 3e 17 08 30 24  ed 03 a0 9d 78 3a 3b 0a  |.#A>..0$....x:;.|
00000190  23 42 47 43 48 41 4e 47  45 53 3a 3b 0a 23 46 47  |#BGCHANGES:;.#FG|
000001a0  43 48 41 4e 47 45 53 3a  3b 0a 2f 2f 2d 2d 2d 2d  |CHANGES:;.//----|

errors thrown from simfile:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 392: invalid continuation byte
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 395: character maps to <undefined>
UnicodeDecodeError: 'cp932' codec can't decode byte 0xed in position 392: illegal multibyte sequence
UnicodeDecodeError: 'cp949' codec can't decode byte 0xed in position 392: illegal multibyte sequence

Stepmania (I tested with ITGmania, but I doubt it has changed) only logs:

00:00.451: Song file "/Songs/Anime Extravaganza 3/Philosophy of Dear World/Philosophy of Dear World.sm" has an unexpected value named "A0$���X".

and happily proceeds with processing the chart -- it's visible and playable.

I'd expect simfile to be able to load such garbled data in strict=False mode for the sake of compatibility.