claird / PyPDF4

A utility to read and write PDFs with Python
obsolete-https://pythonhosted.org/PyPDF2/
Other
330 stars 61 forks source link

Does LZWDecode function as it should? #7

Open acsor opened 6 years ago

acsor commented 6 years ago

I'm in the process of developing unit tests for all the classes in filters.py and with the following code I get several kinds of exceptions, based on the input:

class LZWDecodeTestCase(unittest.TestCase):
    def test_decode(self):
        inputs = (
            "\x80\x0B\x60\x50\x22\x0C\x0C\x85\x01",
            "\x54\x68\x69\x73\x20\x82\x20\x61\x20\x73\x61\x6d\x70\x6c\x65\x20"
            "\x74\x65\x78\x74\x88\x74\x72\x69\x6e\x67\x20\x49\x27\x64\x20\x6c"
            "\x69\x6b\x8e\x74\x6f\x20\x65\x6e\x63\x6f\x64\x65\x85\x01",
        )
        exp_outputs = (
            "-----A---B",
            "This is a sample text string I'd like to encode",
        )

        for o, i in zip(exp_outputs, inputs):
            self.assertEqual(
                o, LZWDecode.decode(i),
                "Input = %s\tExp. output = %s"
            )
Error
Traceback (most recent call last):
  File "/usr/lib/python3.6/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/lib/python3.6/unittest/case.py", line 605, in run
    testMethod()
  File "/home/none/software/Git clones/PyPDF4/Tests/test_filters.py", line 175, in test_decode
    o, LZWDecode.decode(i),
  File "/home/none/software/Git clones/PyPDF4/PyPDF4/filters.py", line 349, in decode
    return LZWDecode.Decoder(data).decode()
  File "/home/none/software/Git clones/PyPDF4/PyPDF4/filters.py", line 337, in decode
    p = self.dict[pW] + self.dict[pW][0]
IndexError: string index out of range

You can test out the whole setup by cloning my fork of the project and checking out to feature_tests. The test commands are as usual:

python2 -m unittest discover --start-directory Tests/
python3 -m unittest discover --start-directory Tests/

Please note that the two last bytes of the test inputs are 0x8501, whose last nine bytes equal 257, that is the EOD marker. Refer to the ISO 32000 standard for more infos.

acsor commented 6 years ago

Please note that LZWDecode has now been renamed into LZWCodec.