decalage2 / oletools

oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.
http://www.decalage.info/python/oletools
Other
2.93k stars 563 forks source link

Multiple tests failing on UnicodeEncodeError: 'ascii' codec can't encode character - RHEL7 + python3.6 #505

Open xambroz opened 5 years ago

xambroz commented 5 years ago

Affected ool: msodde, oleobj, ooxml

Describe the bug When trying to build the oletools package for RHEL7, there is multiple 'ascii' codec encoding errors being reported by the "python3.6 setup.py test".

File/Malware sample to reproduce the bug All test cases already part of the 0.54.2b release.

How To Reproduce the bug

Expected behavior It is expected that test cases should not produce errors.

Console output / Screenshots

check that xml leads to 0 exit status ... /builddir/build/BUILD/oletools-0.54.2b/oletools/crypto.py:244: ResourceWarning: unclosed file <_io.BufferedReader name='/builddir/build/BUILD/oletools-0.54.2b/tests/test-data/msodde/harmless-clean-2003.xml'>
  'encrypted.'.format(some_file, exc))
FAIL
test all files in oleobj test dir ... ERROR
test_md5_args (tests.oleobj.test_basic.TestOleObj) ... ERROR
Ensure old oleobj behaviour still works: pre-read whole file ... ERROR
Checks all samples, expect either ole files or good ooxml output ... FAIL

======================================================================
ERROR: test_xml (tests.msodde.test_basic.TestDdeLinks)
check that dde in xml from word / excel is found
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/oletools-0.54.2b/tests/msodde/test_basic.py", line 164, in test_xml
    field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
  File "/builddir/build/BUILD/oletools-0.54.2b/oletools/msodde.py", line 974, in process_maybe_encrypted
    result = process_file(filepath, **kwargs)
  File "/builddir/build/BUILD/oletools-0.54.2b/oletools/msodde.py", line 947, in process_file
    return process_csv(filepath)
  File "/builddir/build/BUILD/oletools-0.54.2b/oletools/msodde.py", line 826, in process_csv
    results, dialect = process_csv_dialect(file_handle, CSV_DELIMITERS)
  File "/builddir/build/BUILD/oletools-0.54.2b/oletools/msodde.py", line 858, in process_csv_dialect
    dialect = csv.Sniffer().sniff(file_handle.read(CSV_SMALL_THRESH),
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 5469: ordinal not in range(128)
======================================================================
ERROR: test_md5 (tests.oleobj.test_basic.TestOleObj)
test all files in oleobj test dir
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/oletools-0.54.2b/tests/oleobj/test_basic.py", line 94, in test_md5
    self.do_test_md5(['-d', self.temp_dir])
  File "/builddir/build/BUILD/oletools-0.54.2b/tests/oleobj/test_basic.py", line 133, in do_test_md5
    accept_nonzero_exit=True)
  File "/builddir/build/BUILD/oletools-0.54.2b/tests/test_utils/utils.py", line 64, in call_and_capture
    stderr=PIPE if exclude_stderr else STDOUT)
  File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib64/python3.6/subprocess.py", line 425, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/usr/lib64/python3.6/subprocess.py", line 850, in communicate
    stdout = self.stdout.read()
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 462: ordinal not in range(128)
======================================================================
ERROR: test_md5_args (tests.oleobj.test_basic.TestOleObj)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/oletools-0.54.2b/tests/oleobj/test_basic.py", line 103, in test_md5_args
    self.do_test_md5(['-d', self.temp_dir, '-v', '-i'])
  File "/builddir/build/BUILD/oletools-0.54.2b/tests/oleobj/test_basic.py", line 133, in do_test_md5
    accept_nonzero_exit=True)
  File "/builddir/build/BUILD/oletools-0.54.2b/tests/test_utils/utils.py", line 64, in call_and_capture
    stderr=PIPE if exclude_stderr else STDOUT)
  File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib64/python3.6/subprocess.py", line 425, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/usr/lib64/python3.6/subprocess.py", line 850, in communicate
    stdout = self.stdout.read()
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1917: ordinal not in range(128)
======================================================================
ERROR: test_non_streamed (tests.oleobj.test_basic.TestOleObj)
Ensure old oleobj behaviour still works: pre-read whole file
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/oletools-0.54.2b/tests/oleobj/test_basic.py", line 157, in test_non_streamed
    only_run_every=4)
  File "/builddir/build/BUILD/oletools-0.54.2b/tests/oleobj/test_basic.py", line 135, in do_test_md5
    ret_val = test_fun(args_with_path)
  File "/builddir/build/BUILD/oletools-0.54.2b/tests/oleobj/test_basic.py", line 70, in preread_file
    oleobj.process_file(filename, data, output_dir=output_dir)
  File "/builddir/build/BUILD/oletools-0.54.2b/oletools/oleobj.py", line 797, in process_file
    print(u'Filename = "%s"' % opkg.filename)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 12: ordinal not in range(128)
======================================================================
FAIL: test_valid_xml (tests.msodde.test_basic.TestReturnCode)
check that xml leads to 0 exit status
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/oletools-0.54.2b/tests/msodde/test_basic.py", line 48, in test_valid_xml
    self.do_test_validity(join(BASE_DIR, 'msodde', filename))
  File "/builddir/build/BUILD/oletools-0.54.2b/tests/msodde/test_basic.py", line 104, in do_test_validity
    .format(found_error, filename))
AssertionError: Unexpected error 'ascii' codec can't decode byte 0xc3 in position 4560: ordinal not in range(128) from msodde for /builddir/build/BUILD/oletools-0.54.2b/tests/test-data/msodde/harmless-clean-2003.xml
======================================================================
FAIL: test_rough_doctype (tests.ooxml.test_basic.TestOOXML)
Checks all samples, expect either ole files or good ooxml output
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/oletools-0.54.2b/tests/ooxml/test_basic.py", line 68, in test_rough_doctype
    doctype = ooxml.get_type(full_name)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 5469: ordinal not in range(128)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/builddir/build/BUILD/oletools-0.54.2b/tests/ooxml/test_basic.py", line 70, in test_rough_doctype
    self.fail('Failed to get doctype of {0}'.format(filename))
AssertionError: Failed to get doctype of dde-in-word2007.xml
----------------------------------------------------------------------
Ran 61 tests in 285.694s
FAILED (failures=2, errors=4)
Test failed: <unittest.runner.TextTestResult run=61 errors=4 failures=2>

Version information:

Additional context When I was running the tests interactively or interactively rebuilding the package in my own RHEL7 system like "rpmbuild -ba python-oletools.spec", I have not observed these errors. It could be some race condition related to default python encoding or something like that.

decalage2 commented 5 years ago

@christian-intra2net, could you please have a look?

christian-intra2net commented 5 years ago

Gladly

christian-intra2net commented 5 years ago

Could be related to the fact that during rpm-building the tests run in a very minimalistic environment, e.g. without LANG set (or set to "C" maybe). I hope to have a closer look tomorrow

christian-intra2net commented 5 years ago

I can reproduce your errors by prepending "LANG=C" to the call to "python3 setup.py test". However, this problem is fixed in current master (after commit 2f7a1ef1b347a1124d01ef6559d939be5ccf50cd: "Merge pull request #365 ...")

So I suggest you create the rpm based on a newer version or you try to supply your build process with a proper LANG

christian-intra2net commented 5 years ago

@decalage2 Maybe create a new release (either 0.55 or 0.54.3) to simplify this?

decalage2 commented 5 years ago

Yes, I'd like to release 0.55 in the coming weeks, once I have fixed a few other issues.