fancycode / pylzma

Python bindings for the LZMA library
http://www.joachim-bauch.de/projects/pylzma/
GNU Lesser General Public License v2.1
234 stars 72 forks source link

py7zlib fails when decompressing lzma2 bcj2 7z file #60

Closed miurahr closed 5 years ago

miurahr commented 5 years ago

py7zlib fails when attempting decompression test with an attached file testcase.tar.gz (please extract a 7z file from .tgz) , that is


Scanning the drive for archives:
1 file, 250 bytes (1 KiB)

Listing archive: test_lzma2bcj2.7z

--
Path = test_lzma2bcj2.7z
Type = 7z
Physical Size = 250
Headers Size = 190
Method = LZMA2:12 BCJ2
Solid = +
Blocks = 1

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2006-03-16 06:54:41 D....            0            0  test
2006-03-16 06:43:36 ....A           33           60  test/test2.txt
2006-03-16 06:43:48 ....A           33               test1.txt
------------------- ----- ------------ ------------  ------------------------
2006-03-16 06:54:41                 66           60  2 files, 1 folders

test code


    def test_lzma2bcj2(self):
        fp = self._open_file(os.path.join(ROOT, 'data', 'test_lzma2bcj2.7z'), 'rb')
        archive = Archive7z(fp)
        self._test_decode_all(archive)

result

Error
Traceback (most recent call last):
  File "/usr/lib/python3.5/unittest/case.py", line 58, in testPartExecutor
    yield
  File "/usr/lib/python3.5/unittest/case.py", line 600, in run
    testMethod()
  File "/home/miurahr/projects/pylzma/tests/test_7zfiles.py", line 111, in test_lzma2bcj2
    self._test_decode_all(archive)
  File "/home/miurahr/projects/pylzma/tests/test_7zfiles.py", line 105, in _test_decode_all
    self.assertTrue(cf.checkcrc(), 'crc failed for %s' % (filename))
  File "/home/miurahr/projects/pylzma/py7zlib.py", line 776, in checkcrc
    data = self.read()
  File "/home/miurahr/projects/pylzma/py7zlib.py", line 632, in read
    data = getattr(self, decoder)(coder, data, level, num_coders)
  File "/home/miurahr/projects/pylzma/py7zlib.py", line 702, in _read_lzma
    return self._read_from_decompressor(coder, dec, input, level, num_coders, with_cache=True)
  File "/home/miurahr/projects/pylzma/py7zlib.py", line 686, in _read_from_decompressor
    data = decompressor.decompress(input, self._start+size)
ValueError: data error during decompression

When observing call in py7zlib:632 data = getattr(self, decoder)(coder, data, level, num_coders) which called three times. In second call, it returns data =b'This file is located in a folder.This file is located in the root.' which has already decoded and then 3rd call fails.

miurahr commented 5 years ago

When inserting following line, the specific test case passed,. This may means it is enough to copy for last_coder BCJ filter in the case.

--- a/py7zlib.py
+++ b/py7zlib.py
@@ -683,6 +683,7 @@ class ArchiveFile(Base):
                 self._file.seek(self._src_start)
                 input = self._file.read(total)
             if is_last_coder and can_partial_decompress:
+                return input[self._start:self._start+size]
                 data = decompressor.decompress(input, self._start+size)
             else:
                 data = decompressor.decompress(input)

Because BCJ filter is for x86 executables, p7zip may skip BCJ filter for text or other non exe files.

fancycode commented 5 years ago

Thanks for reporting, BCJ2 streams are now supported.