Closed OmegaPhil closed 11 years ago
This was hiding the following error:
Traceback (most recent call last):
File "/mnt/Storage_1/Desktop Files/Linux Programming/Python/animecheck/animecheck.py", line 1233, in sfv_create_mode
files = recursive_file_search(files)
File "/mnt/Storage_1/Desktop Files/Linux Programming/Python/animecheck/animecheck.py", line 445, in recursive_file_search
for directory_path, _, directory_files in os.walk(path):
File "/usr/lib/python2.7/os.py", line 294, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py", line 294, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py", line 294, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py", line 294, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py", line 284, in walk
if isdir(join(top, name)):
File "/usr/lib/python2.7/posixpath.py", line 71, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 1: ordinal not in range(128)
This is happening due to invalid bytes in directory names that have come about as a result of extracting Japanese zips (presumably the zip'd data object names are not maintained in a sane coding and therefore are encoded in the Japanese locale standard).
os.walk, even though it has an onerror parameter, does not properly protect its running with try/except - this error happens due to the unicode path passed - join tries to work with the resultant directories/files as UTF-8 and fails. When a bytearray is passed (or older string that is essentially a byte array) this function works, but the script dies when attempting to later write the checksum file.
I probably need to make my own hardened walk function to detect and report on these invalid directories/files - I don't think they are in a state where I could actually deal with them properly, given how much I rely on standard path manipulation stuff in the code.
I have passed a bytearray to walk and then subsequently sanity checked the output - this works in Python 2, but needs something different for 3.