claird / PyPDF4

A utility to read and write PDFs with Python
obsolete-https://pythonhosted.org/PyPDF2/
Other
330 stars 61 forks source link

Test utils more #44

Closed kurtmckee closed 5 years ago

kurtmckee commented 5 years ago

NOTE: This includes commit bc672fb, which is already the subject of PR #37. I suggest merging #37 first before reviewing this PR.

This more extensively tests utils.py (about +60 additional tests) and, based on the results of those tests, fixes multiple crash bugs and inconsistencies in object types that are returned by the core functions provided in utils.py.

It also removes debug code from utils.py, updates docstrings to improve IDE autocompletion, and adds the PyPDF4 license text to test_utils.py along with copyright attribution.

xilopaint commented 5 years ago

Hey @kurtmckee, do you have any idea on #33?

kurtmckee commented 5 years ago

Hey @xilopaint, let me respond in the report itself. =)

acsor commented 5 years ago

First things first: good thing the unit tests, I can only encourage more :+1:.

Glancing over the code, I notice two things:

Whether to include dependency for (yet) another testing framework should be a subject for discussion. I see you're having lot of fun writing decorated code :-), but in doing so you are subjecting the repository to another "software bond". I, for one, prefer to keep certain things as simple as possible (including lack of intoxicating software requirements).

Is it too much to ask for a restructuring of test cases within class-based unittest code only? It is self-contained, and for what I can see requires no greater amount of code than pytest does. If you're concerned on how you might shorten unittest code or how to write it, take inspiration:

    def testPairs(self):
        """
        Tests ``utils.pairs()``.
        """
        inputs = (
            range(0), range(6), range(10)
        )
        expOutputs = (
            tuple(),
            ((0, 1), (2, 3), (4, 5)),
            ((0, 1), (2, 3), (4, 5), (6, 7), (8, 9))
        )

        for o, i in zip(expOutputs, inputs):
            self.assertTupleEqual(
                o, tuple(pypdf.utils.pairs(i))
            )

This is a style of test code I've been writing over and over; typically a inputs and expOutputs variable, followed by trivial verifications. Only my hand and unittest produce it, and we have no dependency.

If you see reasons to really adopt pytest you can suggest them though :-);

kurtmckee commented 5 years ago

@newnone, thanks for the feedback!

Before responding, I need to get something straight. I'm not going to entertain a philosophical discussion about "purity through for loops" or "dependency bondage". We can have a frank, technical discussion about the code or we can teach each other. Your response indicates that this is a teaching opportunity.

I'm not going to rewrite the tests, but I'll demonstrate why tools like pytest are extremely valuable, and why dropping the pretense of "purity" will lead to more successful and rapid development.

Here's an example using unittest and a for loop:

import unittest

def find_potential_prime(i):
    return i%i**i**i-1

class TestFinder(unittest.TestCase):
    def test_finder(self):
        values = list(range(5))
        for value in values:
            self.assertGreater(find_potential_prime(value), 1)

And the output:

E
======================================================================
ERROR: test_finder (__main__.TestFinder)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "example.py", line 12, in test_finder
    self.assertGreater(find_potential_prime(value), 1)
  File "example.py", line 5, in find_potential_prime
    return i%i**i**i-1
ZeroDivisionError: integer division or modulo by zero

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)

Which of the five values in the for loop caused this test failure?

Okay, maybe we can restrict the range. Let's just change the ol' range to range(1, 5) and voilà:

F
======================================================================
FAIL: test_finder (__main__.TestFinder)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "example.py", line 12, in test_finder
    self.assertGreater(find_potential_prime(value), 1)
AssertionError: -1 not greater than 1

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (failures=1)

Shoot fire. We went from a ZeroDivisionError to an AssertionError.

Which of the four values in the for loop caused this new, and entirely different, failure?

Okay, maybe with a combination of additional changes to the unit tests and some manual work we can isolate the value that's causing this problem. (Take note: everything we're doing now is simply demonstrating that for loop-based unit tests are not actually helping us.)

Instead of all that noise, let's use better tools! Here's the same example using pytest and parametrization:

import pytest

def find_potential_prime(i):
    return i%i**i**i-1

@pytest.mark.parametrize("value", list(range(5)))
def test_finder(value):
    assert find_potential_prime(value) > 1

And the resulting output:

============================= test session starts =============================
platform win32 -- Python 3.7.3, pytest-4.4.2, py-1.8.0, pluggy-0.11.0
rootdir: C:\Users\kumckee\Desktop
collected 5 items

example.py FFF..                                                         [100%]

================================== FAILURES ===================================
_______________________________ test_finder[0] ________________________________

value = 0

    @pytest.mark.parametrize("value", list(range(5)))
    def test_finder(value):
>       assert find_potential_prime(value) > 1

example.py:10:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

i = 0

    def find_potential_prime(i):
>       return i%i**i**i-1
E       ZeroDivisionError: integer division or modulo by zero

example.py:5: ZeroDivisionError
_______________________________ test_finder[1] ________________________________

value = 1

    @pytest.mark.parametrize("value", list(range(5)))
    def test_finder(value):
>       assert find_potential_prime(value) > 1
E       assert -1 > 1
E        +  where -1 = find_potential_prime(1)

example.py:10: AssertionError
_______________________________ test_finder[2] ________________________________

value = 2

    @pytest.mark.parametrize("value", list(range(5)))
    def test_finder(value):
>       assert find_potential_prime(value) > 1
E       assert 1 > 1
E        +  where 1 = find_potential_prime(2)

example.py:10: AssertionError
===================== 3 failed, 2 passed in 0.09 seconds ======================

Hooray! All of the information we need to determine the root cause of the problem! And it only took a single test run to see every detail. =)

So no, I'm not going to rewrite the tests. Further, I encourage you to drop the mindset that real development is about writing everything by hand. Nobody will give you accolades for the unnecessary work you force upon yourself or others, and it will hinder your personal development as a programmer.

acsor commented 5 years ago

Sorry for the late reply, I'm in the midst of an exam season.

This is a nice discussion, and I hope my main point is understood: I'm not adverse to external dependencies in general, but only to those whose learning cost does exceed the productivity benefit.

IMO PyPDF should be urged to adopt consistent coding styles among the community. Prior to the passage to PyPDF4, the repo consisted of multiple, differentiated code snippets with divergent coding styles, naming conventions and objectives. That surely accumulates entropy in the long run.

Regarding the pro side of using pytest

Instead of all that noise, let's use better tools!

Like... a debugger? Of course this is a very narrow scenario, and the benefit of pytest might come from many others, but in this particular instance it can be easily countered by simpler and ubiquitous tools.

Further, I encourage you to drop the mindset that real development is about writing everything by hand

Of course, but you must not ignore the amount of time, mental effort and brain cells taken to learn a new technology. Measuring the productivity effort by means of LOC only is extremely reductive, and does not reflect all the energy spent on actual development. To not say of free volunteers: mandating each one of them to unit test their code while forcing a (possibly) foreign testing framework is likely to cause barriers in what they can individually contribute. Hence the choice of the more popular, albeit simpler, unittest module.

I hope this ends our friendly diatribe :-), and that the exchange of ideas can form a more accurate judgement on what strategy to follow to the appointed maintainers.

kurtmckee commented 5 years ago

Re: coding styles

This discussion is not about coding style.

Re: using a debugger when unit tests fail

The point of unit tests is to validate that code functions the way it is supposed to. The testing methodology is just as important as the testing result itself. Your for loop method prevents the unit tests from being helpful because it not only prevents you from receiving useful output when a failure occurs, it further prevents all subsequent assertions in the for loop from executing. Suggesting that developers should use a debugger on their unit tests implicitly acknowledges that the testing methodology is flawed but yet still demonstrates a blindness to the issue.

Re: Measuring productivity by lines of code

Point of order: you alone have mentioned LOC. Twice.

My point was and is: pytest gives better failure reporting, and pytest's parametrized tests capture the idea of using for loops in unit tests but give you the information you need when they fail.

Re: forcing a foreign testing framework

It's clear that you haven't tried using tox to run the tests; if you had you would discover that it automatically tests all versions of Python for you all at once, it automatically manages all of the testing dependencies for you in a virtual environment (including installing pytest and pytest-cov), it automatically generates a complete HTML coverage report for you, and it still runs all of the unittest-based tests that have already been written.

It is far, far more advanced and helpful than what you're currently using, and has zero impact on your current development methodology.

Please merge or close this PR. Thanks!

acsor commented 5 years ago

Please merge or close this PR. Thanks!

To reject such precious changes would be reckless. Please be patient, it will be a little time before either me or @claird will manage to acknowledge your code.

acsor commented 5 years ago

@kurtmckee although I'm almost done integrating your changes, do you recommend a merging strategy with regard to the way you submitted your PRs?