RFC: Detect when suites need to be run in serial

smspillaz commented 9 years ago

This is an RFC style patch to run suites that define setUpClass, tearDownClass, setUpModule and tearDownModule in serial vis-a-vis each other.

There's a few rough edges, and a few places where I need some comments and ideas on what to do.

The biggest missing part so far is that KeyboardInterrupt handling isn't there. If there's a container in Python which represents "value or exception" kind of like multiprocessing's AsyncResult, then that'd be nice.

toParallelTestTargets subdivides a GreenTestSuite into a set of strings representing loadable targets for the runner. It actually subdivides all the way at the moment until we reach something that looks like a unittest.TestCase, at which point it checks that class and its module to see if any serial-indicating methods are defined such that only the class or module name should be returned.

poolRunner was also adjusted to return its results via a queue - it may run multiple tests now and needs to indicate to the parent runner both when a test has started and finished.

CleanCut commented 9 years ago

That is some impressive work! Having worked on green so much myself, I can guess how much time and effort you put into this.

I really appreciate you bringing this every-test-is-loading-and-running-class-and-module-setup-and-teardown issue to my consciousness. I should have been aware of this fact, but as my day-job organization (the primary place where I use this project) doesn't use module or class setup/teardown, this issue is present much more as a side effect of the other design decisions, rather than as a conscious choice.

I read through all your code, and I spent a few hours pondering about what would be the best thing to do. There are several viable options that I came up with (one of which would be polishing up your code), and in the end I hope you like the conclusion I came to.

I'm going to lead you through my reasoning first:

1) I think I went down a non-optimal path in the first place with the fundamental design where each dotted test name is passed individually to subprocesses and each test is loaded completely independently. I did consider the effect of a module potentially being loaded in each subprocesses, and each test creating an independent copy of the entire TestCase object. At the time I judged that as an acceptable trade-off for a fairly straight-forward loading design (which has become less and less straight-forward as time has gone on), since importing modules and creating TestCase objects tend to be pretty minor time-wise in comparison to the actual tests themselves

However, I failed to even consider the module and class setup/teardown cases. As you pointed out, duplicating that logic, which it can be reasonably assumed is done at that level because it is time consuming, is going to tend to impose an unacceptable performance cost

2) Green's code regarding multiprocess functionality is more complicated than I would like it to be (before this pull request), which I think has prevented several potential contributors from really digging in and making their own pull requests.

After your pull request, quite frankly I don't think even I would want to touch green's multiprocess stuff any more. It's getting really complicated to follow. Really hard to fit the design in your head all at once. [It may not seem too bad to you now, but wait until you come back to it after a year of not touching it much - trust me. :-) ]

3) I would really like green to run in multiple processes by default. Why doesn't it? Because a lot of existing tests that were written without running them through green are not multi-process safe. They assume most often that the tests in a TestCase class are run in a specific serial sequence. Less often they assume that the TestCases in a module are run in a specific serial sequence. However, I haven't often (maybe ever?) noticed any assumptions that multiple modules would be run in a specific serial sequence...

Which brings me to my conclusion. After having considered several different options, I think the best course of action is to scrap the current design which transmits-dotted-test-names one by one to subprocesses which then load them up as if each were completely independent. Then replace that with a design that primarily transmits dotted names of entire modules, which are then loaded up and run "normally" within a given subprocess.

PROS

No special code needed to handle module setup/teardown
No special code needed to handle class setup/teardown
Assuming a relatively high number of modules, overall test runs can be much faster due to much less duplicate importing and setup/teardown
We could end up with less code and a simpler design than we have right now, which is both easier to maintain and easier for others to start contributing to.
We should be able to have green automatically pick the number of processes to run by default (instead of just using 1) while still having a high degree of confidence that most existing test suites will not break (or will require minimal modifications to make concurrent-safe). This alone should make most non-customized green runs faster by default (assuming more than one module is present).
I'm willing to pitch in and give this a try

CONS

You can only get performance enhancements from concurrency at the module level. Only have 1 module? Then no speedup is possible, by design. (Though you still get the other benefits of running the tests through green)
The existing design of green's process handling will need to be significantly refactored. That's still real work.
Output may not be as smooth as it is currently. Instead of individual unit tests being run in subprocesses, and then output coming out in order fairly smoothly (only individual long tests causing output hiccups), instead you will get a single process's test results at 1-processor speed, and then when it is done you may have a huge chunk of output ready from other subprocesses that will then blink past very quickly.
When a TestCase or individual test method is specified on the command-line as the item to test...then we'll have to have separate logic just for handling that.

That's my take on things, anyway. What are your thoughts?

On a completely separate, but very related note, I have a branch where I am polishing per-subprocess setup/teardown. I have held off merging it and releasing it, because I have been making lots of backward-incompatible option and behavior changes.

That is where I would likely also start acting on my proposal above. If you are interested, I invite you to branch off of that branch instead of master.

smspillaz commented 9 years ago

Okay, makes sense. I was actually just thinking now that if you've got more than four modules, then you're going to have full saturation on most systems anyway.

I can probably trivially modify this patch to just work on the module level, passing the actual test case or test names when the user specifies them.

My concerns about how to handle KeyboardInterrupt and friends remain though. On the plus side, we'll be using far fewer queues, so I suppose that problem falls away.

CleanCut commented 9 years ago

Closing this out, since we implemented this all via a different pull request.

CleanCut / green

RFC: Detect when suites need to be run in serial #69