exercism / python

Exercism exercises in Python.
https://exercism.org/tracks/python
MIT License
1.92k stars 1.28k forks source link

Unable to simulate TDD with pytest #1853

Closed glvno closed 4 years ago

glvno commented 5 years ago

I'm not sure how it is with previous versions, but with Python 3.7.3 and Pytest 5.0.1, tests are run in alphabetical order rather than in the order they appear in the test file. As I understand it, Exercism's explicit pedagogical aim is to simulate TDD, and so it seems pretty important that the tests get run in the order they appear in the file such that pytest -x stops at the simplest of the failing test cases -- i.e., the case a test-driven developer would have written first.

As it stands now, the first failing test is often for an edge case that isn't really useful until the program's basic functionality is up and running.

In the minitest files for the Ruby track, if I remember correctly, # skip lines are included for all but the first test, and the learner is instructed to uncomment the skip for the next test once they get a given test to pass.

I would imagine there's a more elegant fix for this than simply going back through all the track's test files and adding hundreds of # @pytest.mark.skipdecorators, but I've yet to figure out what it is. Maybe there's some way to configure pytest to run the tests in the order they appear in the test file that I'm just unaware of? I've spent a couple of hours looking into this to no avail so far.

yawpitch commented 5 years ago

Tests created using unittest — which includes all tests in the track — will, IIRC, always run in name-sort order, so to ensure a particular sort order we’d need to manually rename the tests to produce the desired order. Also as we’re slowly moving towards auto-generated test suites we’d have to assume that the definition order in the canonical-data.json file feeding the test generation was the correct ordering for a true TDD experience. Based on a very rough sample of canonical data filed I’ve looked at recently I’d say that assumption is dubious at best.

Can you give an example of a particular exercise where the sort order you’re getting is that far off ideal?

glvno commented 5 years ago

Here's the first test from the word count exercise test suite:

def test_count_one_word(self):
        self.assertEqual(
            count_words('word'),
            {'word': 1}
        )

And here's the first one that gets run:

    def test_alternating_word_separators_not_detected_as_a_word(self):
        self.assertEqual(
            count_words(",\n,one,\n ,two \n 'three'"),
            {'one': 1, 'two': 1, 'three': 1}
        )

I have no doubt that the canonical test order isn't a 100% perfect approximation of the TDD experience, but I would think that running the 11th test in the file before the first, as is the case here, is pretty much always going to result in an experience that is not even close.

But if it's indeed the case that renaming every test is the only solution, then yeah, I don't mind just dealing with it myself by adding decorators.

yawpitch commented 5 years ago

@glvno thank you for the example, that's an exercise that is indeed reasonably well crafted so that the order of tests can be seen as significant.

It's an interesting problem: general practice in unit testing is that the order of the tests should not matter. In fact many testing suites run tests in parallel as a matter of practice, so the order cannot matter. This is a little less true of integration tests, though even there it's -- again generally -- considered best if the tests are decoupled and order is irrelevant. Good TDD practice implies adding tests in a natural order, but it doesn't mean running them in that order, if you see what I mean.

That said the developers of test runners tend to choose an order as a matter of implementation; the unittest developers chose lexi sort order and the pytest people chose discovery order (which is declaration order for test written in simple functions, but it defers to unittest's ordering for tests that are subclasses of unittest.TestCase).

Implementing as subclasses of unittest.TestCase is and was sensible for the Python track, because that's the only "universal" approach that all the various Python test runners adhere to, but it leaves us with lexi sort order, for better or worse, as the de facto ordering of tests.

Where we could fix this is in test generation, which right now is a manual process. We'd have to do that by taking the sort order received from the canonical data and generating tests that sorted in that order lexicographically.

@cmccandless should we add that to the list for test generation? It's going to result in some hideous method names, but it would make an iterative pytest -x experience a bit nicer.

That said my concern with doing so is that I've noticed several instances in which the ordering of the tests seems to heavily influence the solutions some students are providing -- in that their solution, for better or worse, follows from trying to progress through the tests in the order they're presented, rather than engaging with the ideas -- when it should not have that influence. In the "real" world tests will not be curated to present in a specific order, though I'm realizing from the debate it exercism/problem-specifications#1553 that "real world" isn't necessarily a concern here.

glvno commented 5 years ago

It's an interesting problem: general practice in unit testing is that the order of the tests should not matter. In fact many testing suites run tests in parallel as a matter of practice, so the order cannot matter ... Good TDD practice implies adding tests in a natural order, but it doesn't mean running them in that order, if you see what I mean ... It's going to result in some hideous method names, but it would make an iterative pytest -x experience a bit nicer.

I have no idea how test generation works, so maybe this isn't feasible, but it seems like using commented-out skips would address both issues as well as keep the test method names from getting too gnarly. The experience would be essentially identical to writing tests in a natural order in that each time a test is uncommented ('written') it gets run with every other uncommented test in an arbitrary order.

yawpitch commented 5 years ago

Test generation is not, as of yet, working, so any usage of it for this purpose is a bit of a reach goal at the moment. The names wouldn't be too egregious, they'd basically all have to become test_00_count_one_word, test_01_some_other_test, and so on. Not ideal, but probably doable.

In theory we could put in marks for the pytest-ordering plugin and if the user had that installed they'd get ordered tests right off the bat. The problem would be that for everyone else there'd be a slew of Warnings whenever they ran the tests. Plus the plugin itself is alpha and seems to have some issues. So not ideal.

Marking all the tests in all the exercises with a commented #@pytest.mark.skip on every test beyond the first would be a big pull request -- there are 117 files to modify -- but not that difficult to create. However it would have two issues:

  1. It would force all 117 exercises into a "upgrade available" state, which could trigger a whole lot of people to update a whole lot of exercises without the tests actually changing. Potentially disruptive and confusing.
  2. You'd essentially end up in the same state you're in now since all tests would run unless you uncommented all of them (ie #@pytest.mark.skip has no effect and the test is run, @pytest.mark.skip skips the test, it's a bit unclear from the above if that's how you understand the behaviour).

Pushing out a change that disables all but the first test on all 117 exercises is, unfortunately, a clear non-starter, as that would break the existing semantics for mentors and students alike.

yawpitch commented 5 years ago

@glvno I've talked this through with @cmccandless and we agree that this would be a sensible thing to try to address, but after a bit of investigation I've come up with some complications.

  1. We should not use @pytest.mark.skip; that requires an import of pytest into the test module, and there's no guarantee that the user will have pytest installed, and I don't believe we want to require them to install it. Adding an "import if possible, else fallback to a dummy implementation" section at the top of the file would be possible, but clumsy.
  2. We would use @unittest.skip, which is what we're currently using to mark "bonus" tests.
  3. We would mark all tests with a commented (thus disabled) #@unittest.skip, keeping the current semantics and keeping the existing lexicographic ordering; the user who desired definition order testing would then be responsible for manually un-commenting these to enable the skipping behaviour, as well as re-commenting as they proceed through the tests.

Also there's an ambiguity we'd need to resolve. Some of the test files have more than once TestCase, so would it be best to:

  1. Mark all all test methods of all TestCase classes in the file as @unittest.skip except the first test case of the first TestCase?
  2. For every TestCase mark all of the tests beyond the first as @unittest.skip, and for every TestCase beyond the first mark the entire class as @unittest.skip?

I'm leaning towards the second option, but it adds more noise to the file. That said having never actually checked I'm not sure what happens if all a TestCase's methods are marked to skip.

h2oboi89 commented 4 years ago

for other people that are having this problem:

I have written the following Python script: format_test.py that utilizes the pytest-ordering module to tweak the test file so that tests are run in order.

Usage format_test.py <test file>.py

cmccandless commented 4 years ago

@yawpitch, this slipped my attention when we were implementing the generator as it stands today. What are your thoughts on this now?

yawpitch commented 4 years ago

Right, so my thoughts haven't advanced too much on this one, but here's the current state of things:

Automatic test generation is working, however we did not add any logic in for setting a specific test ordering, or adding numerically advancing names, or pytest.mark inclusion, or otherwise trying to address this issue.

We didn't do that for a good reason: we simply cannot rely on the canonical-data.json files having taken ordering into account in a manner that makes any sense to this sort of progressive TDD approach. Many tests have been added with no discernible concern for that sort of ordering, and I simply don't see us being able to apply such an ordering to that data, especially since that repo is still locked.

Now, the automated test runner, which is intended for the features coming to the V3 version of exercism launching later this year has already been built to apply the test definition order as requested above (https://github.com/exercism/python-test-runner/blob/master/runner/sort.py), to whatever good that does us. However this is really only a solution for tests submitted to exercism.io infrastructure for processing ... there's no good and reliable way to enforce the same ordering on tests run locally, even if we had confidence that the order had been carefully curated.

Now, after investigating how to do this while developing the automated runner I can say that, in theory, we could potentially offer an "official" way to do this locally by making available (or shipping with the exercises) a conftest.py file that would use the same logic to order the tests. Those users who happen to be using pytest would then get the declaration-ordering, assuming that file were present in the root directory they were running pytest within.

But the difficulty there is that again we're effectively forcing the use of pytest as the local test runner ... if we decide that's a de facto requirement for all the students then why go to the trouble of hacking the definition order into unittest.TestCase-based tests at all?

See pytest itself uses definition order for tests written in pytest's "function: style by default ... the only reason the problem exists at all is because we're using unittest for compatibility reasons, and that choice in turn means that whether you use pytest or nose or just run the file directly you're going to end up getting the unittest library's test discovery logic, which sorts a given class's tests in lexicographical order.

Thus if we give a hard requirement on pytest as the only "blessed" test runner then wouldn't we be better off simply re-writing all the templates to generate pytest-style tests anyway, and we'd get the definition order for free in the process?

The script @h2oboi89 has provided above is a bit of a merciless hack, but it will work for people who want to use it, even if it does rewrite the test files -- and woe be unto anyone that accidentally gives it a path to a system critical file -- so there is a workaround. I'm not sure there's a need for a more "official" solution, but I'm open to someone investigating the conftest.py approach if they'd like.

AnAccountForReportingBugs commented 4 years ago

With unittest you can specify a sort function. Assuming the test file defines them in the order you like, the way I'd probably do it is sort the functions by line number, instead of name.

Something along the lines of:

if __name__ == "__main__":
    loader = unittest.TestLoader()
    line = lambda x: inspect.getsourcelines(getattr(GrepTest, x))[-1]
    line_cmp = lambda a, b: (line(a) > line(b)) - (line(a) < line(b))
    loader.sortTestMethodsUsing = line_cmp
    unittest.main(testLoader=loader)
yawpitch commented 4 years ago

That's another workaround that can be used, but I still don't believe we should introduce it into the tests we distribute to students.

My reasoning remains the same as described above: any assumption that the tests in the various canonical data files have been entered into those files in a curated, TDD-friendly order is categorically False ... no such reliable order exists, or is ever likely to be introduced, and introducing a line-number ordering to the tests is just going to provide a false sense that such an ordering exists, when it does not.

AnAccountForReportingBugs commented 4 years ago

We agree on the underlying assumption (whether the canonical data should have such an order is a different question), I just wanted to point out that it is possible to get unittest to run tests in whatever order you want using arbitrary functions, in case you did come up with such an order in the future, some exercise requires it, or to have local tests run the same order as the automated runner you referenced.

yawpitch commented 4 years ago

I just wanted to point out that it is possible to get unittest to run tests in whatever order you want using arbitrary functions...

And that's helpful, however I'm of the mind we should close this Issue as wontfix because of that lack of guarantees in the upstream data. The canonical data repo is currently locked to major changes and it remains to be seen precisely how it will be incorporated into V3, so I don't think we're able to meaningfully move on this right now except by introducing something like that logic, which without those same guarantees really just slows down tests to provide a false sense of order.

Anyone strongly disagree with that?

cmccandless commented 4 years ago

In favor of won't fix

yawpitch commented 4 years ago

Closing as wontfix.