Extract a generic framework for running randomized tests. [LUCENE-3492]

asfimport commented 13 years ago

The work on this issue is temporarily at github (lots of experiments and tweaking): https://github.com/carrotsearch/randomizedtesting Or directly: git clone git://github.com/carrotsearch/randomizedtesting.git {color}

RandomizedRunner is a JUnit runner, so it is capable of running @Test-annotated test cases. It respects regular lifecycle hooks such as @Before, @After, @BeforeClass or `@AfterClass`, but it also adds the following:

Randomized, but repeatable execution and infrastructure for dealing with randomness:

uses pseudo-randomness (so that a given run can be repeated if given the same starting seed) for many things called "random" below,
randomly shuffles test methods to ensure they don't depend on each other,
randomly shuffles hooks (within a given class) to ensure they don't depend on each other,
base class RandomizedTest provides a number of methods for generating random numbers, strings and picking random objects from collections (again, this is fully repeatable given the initial seed if there are no race conditions),
the runner provides infrastructure to augment stack traces with information about the initial seeds used for running the test, so that it can be repeated (or it can be determined that the test is not repeatable – this indicates a problem with the test case itself).

Thread control:

any threads created as part of a test case are assigned the same initial random seed (repeatability),
tracks and attempts to terminate any Threads that are created and not terminated inside a test case (not cleaning up causes a test failure),
tracks and attempts to terminate test cases that run for too long (default timeout: 60 seconds, adjustable using global property or annotations),

Improved validation and lifecycle support:

RandomizedRunner uses relaxed contracts of hook methods' accessibility (hook methods can be private). This helps in avoiding problems with method shadowing (static hooks) or overrides that require tedious super.() chaining). Private hooks are always executed and don't affect subclasses in any way, period.
@Listeners annotation on a test class allows it to hook into the execution progress and listen to events.
@Validators annotation allows a test class to provide custom validation strategies (project-specific). For example a base class can request specific test case naming strategy (or reject JUnit3-like methods, for instance).
RandomizedRunner does not "chain" or "suppress" exceptions happening during execution of a test case (including hooks). All exceptions are reported as soon as they happened and multiple failure reports can occur. Most environments we know of then display these failures sequentially allowing a clearer understanding of what actually happened first.

Screen Shot 2011-10-06 at 12.58.02 PM.png

Migrated from LUCENE-3492 by Dawid Weiss (@dweiss), resolved Feb 14 2012 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png Linked issues:

4857
- 4563

asfimport commented 13 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Static fixtures couldn't be handled with a rule, so I've decided to rewrite JUnit Runner instead of subclassing it. Lots of frustration so far, but I like the result :)

asfimport commented 13 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I've implemented a runner that follows the basic algorithm given in #4563. Basically speaking, seeds for each test run are fixed derivations of a single master seed (used for the runner and all class-level fixtures) and don't rely on the order of invocations or other factors.

There's plenty of ways to tweak and tune by overriding class-level @Seed, method-level @Seed. @Repeat gives you control on how many times a given test is executed and whether a seed is reused (constant for each iteration) or randomized (predictably from the start seed).

Most of all, everything fits quite nicely in Eclipse (and I hope other GUIs... didn't check Idea or Netbeans though) because each executed test run is nicely described in the runner (full seed), so that you can either click on it and re-run a single test or write down the seed and fix it at runtime.

Lots of TODOs in the code, will continue in the evening.

asfimport commented 13 years ago

Shai Erera (@shaie) (migrated from JIRA)

This is only for debugging from an IDE right? It does not replace tests.iter and tests.seed?

It looks very cool.

It also adds a risk that someone will accidentally commit tests with these annotations. So perhaps we should add pre-commit hooks, or a test that scans all test files and ensures those annotations do not exist?

asfimport commented 13 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Hi Shai. This is definitely not only for debugging. For example we use randomized testing inside CarrotSearch to test algorithmic/ combinatorial code. Once you hit a bug, you simply copy the test case (or a call to a common test case method) and fix the seed to have a regression test for the future (so that you know you're not failing examples that previously failed). So, for example:

`@Test` @Seed("23095324")
public void runFixedRegression_1 { doSomethingWithRandoms(); }

`@Test` @Seed("239735923")
public void runFixedRegression_2 { doSomethingWithRandoms(); }

`@Test`
public void runRandomized { doSomethingWithRandoms(); }

This is a scenario I really came to like. It's a bit like your tests write themselves for you :)

I left system properties for fixing seeds and enforcing repetition number because they are currently in Lucene, although I personally don't like them that much (because they affect everything globally). I do understand they're useful for quick hacking without recompiling stuff or for remote executions, but I'd much rather have something like -Dseed.testClass[.method]=xxxx which would affect only a single class or method rather than everything. The same can be done for filtering which method/ test case to execute. This is debatable of course and a matter of personal taste.

I should publish what I have tonight on github (I'm moving certain things out of our proprietary codebase and there are JUnit corner cases that slow things down).

asfimport commented 13 years ago

Shai Erera (@shaie) (migrated from JIRA)

Ok I get the point now.

But I still think we should have specific unit tests that reproduce specific scenarios, than using some monstrous tests that happened to stumble on a seed that revealed a bug. If however the scenario cannot be reproduced deterministically, then I agree that this framework is powerful and useful.

asfimport commented 13 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Sure, absolutely. In our (mostly algorithmic, mind you) experience even small test cases can be randomized and then it is really duplicated effort to re-write them for a particular bug scenario (the tests are often simple, the data changes). But sure: the simpler the test, the better.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I agree too. one difficulty with using @seed or something is our seeds quickly become out of date because we are often adding more randomization to our testing framework (e.g. additional craziness to randomindexwriter, searchers, analyzer, whatever)

asfimport commented 13 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

That's why I mentioned I would like this to become generally useful, not only restricted to Lucene/Solr :) If we make it work for two projects (Carrot2 and Lucene) chances are the outcome will be flexible enough to use elsewhere.

I'm not saying you must fix the seeds using annotations – it's an option.

asfimport commented 13 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Ok. I've published the project on github here: https://github.com/dweiss/randomizedtesting

The repo contains the runner, some tests and examples. Lots of TODOs (in TODO), so consider this a work-in-progress, but if anybody cares to take a look and shout if something is definitely not right – go ahead.

mvn verify on the topmost project compiles everything and runs the tests/ examples. I don't see any functional deviations or differences in execution between ant maven and my Eclipse GUI (mentioned by Robert) which is good.

asfimport commented 13 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

A word of warning: this will be a longer comment. I still hope somebody will read it ;)

I've written a somewhat largish chunk of code that provides an infrastructure to run "randomized", but "repeatable" tests. I'd like to report on my impressions so far.

Robert was right that a custom runner provides more flexibility than a @Rule on top of the default JUnit runner (which changes depending where you run it – ant, maven, Eclipse, etc.). I've spent a lot of time inspecting the current implementation inside JUnit and I came to the conclusion that it really is best to have a full reimplementation of the Runner interface. Full meaning not descending ParentRunner, but implementing the whole runner from scratch. This provides additional, uhm, unexpected benefits in that one can add new functionality that "regular" JUnit runners don't have and still be compatible with hosting environments such as Ant, Maven or Eclipse (because they, thank God, respect @RunWith).

Among the things I have implemented so far that are missing or different in JUnit are:

There is a "context" object which is accessible via thread local, so @BeforeClass and other suite-level hooks can actually access the suite class, inspect it, check conditions, whatever (the runner's random seed is also passed via this context). This is useful, but not crucial.
I've decided to deviate from JUnit strict policy of having public hook methods. By default this only causes headaches when one shadows or overrides a hook in the parent class and it is no longer invoked. A better (different) idea is to declare hooks as private; no shadowing occurs and they will all get invoked in a contractual predefined order (befores - super to class, afters - class to super).
I've added additional suite-level annotations. @Listeners provides listeners automatically hooked to RunListener. @Validators hooks up additional validators for verifying extra restrictions. An example of such a restriction is bailing out the test suite if shadowed or overridden methods exist in the class hierarchy of a suite class. Another (that I have implemented) is a validator checking for non-annotated testXXX methods that are dead JUnit3 test cases. You get the idea. A lot of code then simply vanishes from LTC; I can envision it having this shape:
```
`@Listeners`({
StandardErrorInfoRunListener.class})
`@Validators`({
NoHookMethodShadowing.class,
NoTestMethodOverrides.class,
NoJUnit3TestMethods.class})
public abstract class LuceneTestCase extends RandomizedTest {
...
}
```
Some of these things are currently verified using a state machine (calling super() in overridden methods), but this just looks better to me to take away this concern elsewhere rather than implement it inside LTC.
The entire lifecycle of handling test method calls and hooks is controlled in the runner. I made a design decision to not follow JUnit's insane wrap-wrap-wrap-exception style but instead report all exceptions that happen anywhere in the lifecycle. So if you get an exception in the test case, followed by an exception in @After, followed by an exception in `@AfterClass`, all these exceptions will be reported separately to the RunListener and in effect to all listening objects (in the lifecycle-corresponding order!). Such an implementation does work with fine with ANT JUnit reports, maven reports and in Eclipse (all exceptions are included) so far as I can tell – didn't check other environments like NetBeans or IntelliJ. Again: in my personal opinion this is a much clearer way of dealing with exceptions in the lifecycle of JUnit test case compared to wrapping them into artificial exceptions (MultipleException being a supreme example) or suppressing them altogether.
I couldn't resist a tiny tweak of making any exceptions thrown from hooks or test methods carry the information about the seed used in their execution (both runner-level and method-level, even though the latter could be derived from the former). There is no easy way to do it because Throwables are designed not to allow changes to their content once constructed. With the exception of stack traces :) So I simply inject a debugging info inside the stack trace as an artificial entry; what it looks like is here, for instance:
```
java.lang.Error: Blah blah exception message.
at __randomizedtesting.SeedInfo.seed([60BDF6E574486C2:60BDF6E76C930BC]:0)
at […].examples.TestStackAugmentation$Nested.testMethod1(TestStackAugmentation.java:29)
```
(Note how the seed info is inside the file position of StackTraceEntry object.). This may seem like overly clever solution, but I've had it many times that sysouts got discarded or lost somehow and an exception object along with the stack trace is always there in front of your eyes. Another way to capture-and-dump reproduction info is to use @Listeners annotation above; this can be used for much what LTC does today – -D…, -D…, -D...
A custom runner can have custom implementation of the contractual "events", such as assumptions or ignore triggers. This takes away a lot of code related to trying to get around JUnit's API limitations (assume without message/cause, method filtering and dynamic ignores based on extra conditions like @Nightly, etc.).

In short: I'm really happy with a custom Runner.

As for the infrastructure for writing randomized test cases:

There is currently one "master" seed that the runner either generates randomly or accepts as a global constant. Everything else: method shuffling, initial random instance for each test case (method repetition)… really everything is based on sequential calls to this generator. This has advantages and disadvantages I guess (read about static initializers below), but it was my personal desire to implement it this way and based on my few days' worth of experience with this code, it works great.
I've written a base class RandomizedTest that extends Assert and has a number of utility methods for picking random numbers or objects from collections. There is no passing of explicit Random instances around like it is done currently in LTC though. The base class accesses the context's Random (which it is assigned by the runner) and then uses this random consistently to generate pseudo-randomness in selection of attributes and iterations. Of course once you go multi-threaded this will all go to dust, but I imagine multi-threaded tests shouldn't use the base class's randomness (a test case based on race conditions won't be repeatable anyway). If anything, generate per-thread Randoms based on current seed and let each thread handle its own sequence of pseudo-random numbers from there. This is even possible at runtime with non-mock objects as I'm going to show in Barcelona, hopefully.

Now… if you're still with me you're probably interested how this applies to Lucene. The wall I've hit is the sheer amount of code that any change to LTC affects. I realized it'd be large, but it's just gargantuan :)

The major issue is with static initializers and static public methods called from them that leave resources behind. I'm sorry, but nobody can convince me this isn't evil. I understand certain things are costly and require a one-time setup, but these should really be moved to @BeforeClass fixture hooks. If one really needs to do things once at JVM lifespan level a @BeforeClass with some logic to perform a single initialization can be a replacement for a static initializer (even if it's unclear to me when exactly such a fixture would be really needed). In short: the problem with static initializers is that they are executed outside the lifecycle control of the runner… I'd say most of the problems and current patchy solutions inside LTC (dealing with resource tracking for example) are somehow related to the fact that static initializers and static method calls are used throughout the codebase.

I am currently wondering if it's feasible to provide a single patch that will make a drop-in replacement of LTC. It may be the case that adding another skeleton class based on the "new" infrastructure and rewriting tests one by one to use it may be a more sensitive/ sensible way to go.

The runner (alone) is currently at github if you care to take a look. I think Barcelona may be a good place to talk about this face to face and decide what to do with it. I'm myself leaning towards the: have parallel base classes and port existing tests in chunks.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I am currently wondering if it's feasible to provide a single patch that will make a drop-in replacement of LTC. It may be the case that adding another skeleton class based on the "new" infrastructure and rewriting tests one by one to use it may be a more sensitive/ sensible way to go.

Could we add this stuff, cut over our runner to it (our runner doesnt actually do that match?) and then migrate our base class functionality piece by piece to the runner code (nuking it from LuceneTestCase?)

asfimport commented 13 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Yeah, I was thinking about that too and I've actually started, replaced the runner successfully but then there is no simple "piece by piece" with all the static method calls tangled together... And RandomizedRunner requires a different seed format etc... I'll give it another shot though.

asfimport commented 13 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

In response to my question whether the idea of randomized testing is new Yuriy Pasichnyk passed me the info about Haskell's QuickCheck project. Indeed, the idea is pretty much the same (with differences concerning implementation details, not the concept itself).

http://en.wikipedia.org/wiki/QuickCheck

There is a Java port of this too, if you check out Wikipedia. The implementation follows a different direction compared to what I implemented, but there are also pieces that are nearly 1:1 identical copies. Good to know – this means I wasn't completely wrong in my goals.

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I consider this issue done. I've pulled all the super-cool features from Lucene/Solr and put them into a separate, stand-alone and reusable project called randomizedtesting. We have switched our infrastructure at Carrot2 to use it and I'm very happy with it.

http://labs.carrotsearch.com/randomizedtesting.html

I will file another issue concerned with progressively moving from LuceneTestRunner/Case and tests infrastructure to RandomizedRunner (and siblings).

apache / lucene

Extract a generic framework for running randomized tests. [LUCENE-3492] #4566

4857

4563