riffxtract - block type decode to use full 0-255 character range

greg-kennedy commented 3 years ago

On my system some test files crash at line 242 with errors like this:

Traceback (most recent call last):
  File "../riffxtract", line 347, in <module>
    parse_riff(fileContent, rifx_offset)
  File "../riffxtract", line 242, in parse_riff
    if (block_type[c].decode('ascii') > '\0' and
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)

Python's "ascii" supports only chars in range 0-127. Using "latin1" codepage it can support the full 0-255 byte range. Since these upper chars are replaced with _ anyway, the fix seems OK to me.

System25 commented 3 years ago

Thank you for your interest in this project. Please provide an example of a file contained inside a RIFF file that contains that non-ascii character.

One of the problems with Director files is that you can find projects based in different character encodings (including Japanese characters), so I'd like to be sure that latin1 is enought.

greg-kennedy commented 3 years ago

I have placed four example files here, taken from the Director-based "Inbred with Rednex".

https://greg-kennedy.com/_greg/rednex

Here's the output error messages.

MATTI.DIR =>
Traceback (most recent call last):
  File "../riffxtract", line 347, in <module>
    parse_riff(fileContent, rifx_offset)
  File "../riffxtract", line 242, in parse_riff
    if (block_type[c].decode('ascii') > '\0' and
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)

PRESSCUT.DIR =>
Traceback (most recent call last):
  File "../riffxtract", line 347, in <module>
    parse_riff(fileContent, rifx_offset)
  File "../riffxtract", line 242, in parse_riff
    if (block_type[c].decode('ascii') > '\0' and
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal not in range(128)

PROJEC.DIR =>
Traceback (most recent call last):
  File "../riffxtract", line 347, in <module>
    parse_riff(fileContent, rifx_offset)
  File "../riffxtract", line 242, in parse_riff
    if (block_type[c].decode('ascii') > '\0' and
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfe in position 0: ordinal not in range(128)

REDNEX.DIR =>
Traceback (most recent call last):
  File "../riffxtract", line 347, in <module>
    parse_riff(fileContent, rifx_offset)
  File "../riffxtract", line 242, in parse_riff
    if (block_type[c].decode('ascii') > '\0' and
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8a in position 0: ordinal not in range(128)

System25 commented 3 years ago

Thank you very much for the tests files! After analyzing one of them I realized that the bug was not caused by the ASCII encoding of the characters. The code to extract the RIFF file content was missing the padding byte if the chunk's length is not even. (Please check the file format in https://en.wikipedia.org/wiki/Resource_Interchange_File_Format#Explanation ). It specifies that the identifier is ASCII encoded and the pad byte that was missing.

So the following bugfix has been applied: https://github.com/System25/drxtract/commit/2f93a2184e24acb1c1f7cab44822bc8aabc337de

Thank you very much for your cooperation!

System25 / drxtract

riffxtract - block type decode to use full 0-255 character range #8