decalage2 / oletools

oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.
http://www.decalage.info/python/oletools
Other
2.88k stars 564 forks source link

all oletools - exception when opening filenames with unicode chars on Win10 with Python 2 #246

Open decalage2 opened 6 years ago

decalage2 commented 6 years ago

On Windows 10 with Python 2.7, when opening a filename containing unicode chars from the console, the following exception is raised by olefile:

IOError: [Errno 22] invalid mode ('rb') or filename: '...'

The same file works fine with Python 3.

Potential solution: https://github.com/Drekin/win-unicode-console

decalage2 commented 6 years ago

It looks like the issue is not with oletools nor optparse/argparse, but due to the fact that sys.argv on Python 2 for Windows does not support unicode. win-unicode-console provides a solution:

_Similarly to the input from from sys.stdin the arguments in sys.argv are also bytes on Python 2 and the original ones may not be reconstructable. To overcome this we add unicode_argv module. The function unicode_argv.get_unicode_argv returns Unicode version of sys.argv obtained by WinAPI functions GetCommandLineW and CommandLineToArgvW. The function unicodeargv.enable monkeypatches sys.argv with the Unicode arguments.

decalage2 commented 6 years ago

Adding the following lines seems to fix the issue:

import win_unicode_console
win_unicode_console.enable(use_unicode_argv=True)