Refactor test classes that use assumeFalse(codec != SimpleText, Memory) to use new annotation and move the expensive methods to separate classes [LUCENE-3489]

asfimport commented 13 years ago

Folloup for #4537.

TODO:

Move test-methods that need the new @UseNoMemoryExpensiveCodec annotation to separate classes
Eliminate the assumeFalse-calls that check the current codec and disable the test if SimpleText or Memory is used

Migrated from LUCENE-3489 by Uwe Schindler (@uschindler), updated May 09 2016 Attachments: LUCENE-3489.patch (versions: 3) Linked issues:

4537
- 4502
- 4566

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

you know, another similar issue we have is tests that assumeFalse(codec != PreFlex), because of things like new index statistics or byte terms or other features that it doesn't support.

Maybe there is some way we could generalize the annotation?

something like @AvoidCodecs("SimpleText", "Memory"), @AvoidCodecs("PreFlex"), and this set would be handled like the boolean today?

asfimport commented 13 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Nice idea, this can be easily transformed to a annotation with param! Of course it would be per-class, too.

asfimport commented 13 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

While working on my presentation I've been trying to generalize the concept of "randomized tests". There's definitely a lot of great concepts here, but they're closely coupled with Lucene and the rest of the Solr/Lucene infrastructure.

I have a draft code of a RandomizedTesting framework that provides very much similar functionality, although in a slightly different technical way (for example it's based on JUnit @Rules only, not on a custom runner/ base abstract class). I would really like you to peek at this and, perhaps with some effort, generalize the concepts in the test framework instead of introducing more Lucene-specific annotations.

I'll publish the code tomorrow (it still needs some, ehm, polishing) along with some thoughts that I had about the current code in LuceneTestCase/Runner.

I'd really like this to evolve into something of a stand-alone project (even if bundled with Lucene) so that other project can benefit without necessarily rely on the rest of Lucene code. We're already using a somewhat decoupled code internally and making it really cross-project applicable is a great way of proving these concepts are generally useful.

Until tomorrow? :)

asfimport commented 13 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

(for example it's based on JUnit @Rules only, not on a custom runner/ base abstract class)

This is a nice idea, although the reason for the LuceneTestCaseRunner and the abstract base class is more because we hate @Test annotations and dont want to add thousands of stupid Assert-includes (having this is abstract base class is more convenient). Just to mention UweSays on twitter: @UweSays

asfimport commented 13 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Yeah, I figured that you want to keep it compact. These may be compatible because there's nothing forbidding us to keep LuceneTestCase as a base class (descending from Assert and providing Lucene-related infrastructure). I'm just trying to push all the randomization (seed initialization, reproducibility, thread controls) out of LuceneTestCase and into something more generic. So far it looks good to my eyes, but I'll be looking forward to your strict German opinion, Uwe :)

Oh, by the way – is there any particular reason for so many things to be static (class level)? I get these are fixtures reused by tests but would people scream if they were object-level fixtures rather than class-level fixtures? It'd make things a bit easier... starting with the need for a single initial seed, for example.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Oh, by the way – is there any particular reason for so many things to be static (class level)? I get these are fixtures reused by tests but would people scream if they were object-level fixtures rather than class-level fixtures? It'd make things a bit easier... starting with the need for a single initial seed, for example.

why we have the different seeds: One thing we do is support running a test class (test1(), test2(), test3()). If test2() fails, we want to be able to just run that method and reproduce it. So we allow you to specify -Dtestmethod to only run a single method.

At the same time, we want to support doing things like creating indexes in beforeClass() and afterClass() for efficient tests. We also support -Dtests.iter, where you run a single test method over and over... this is often convenient. If we only had 1 class-level seed, this would be useless as it would just do the same thing over and over!

So the need for multiple seeds comes from the fact that some things are random at "class-level" and some things are at "method level". If you look at the 3 parts to the random seed, its really part1:part2:part3,

part1 = class seed
part2 = method seed
part3 = runner seed (this is needed for consistent randomization of test methods)

asfimport commented 13 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Yeah, I figured that you want to keep it compact. These may be compatible because there's nothing forbidding us to keep LuceneTestCase as a base class (descending from Assert and providing Lucene-related infrastructure).

Yes, I just wanted to mention this.

The other stuff in LuceneTestRunner is just to work around some limitations in JUnit's @BeforeClass: @BeforeClass does not pass the Class object to the annotated method, and you cannot find out which child class is initialized. So checking for annotations on the implementation class from the abstract LuceneTestCase base class does not work.

Oh, by the way – is there any particular reason for so many things to be static (class level)? I get these are fixtures reused by tests but would people scream if they were object-level fixtures rather than class-level fixtures? It'd make things a bit easier... starting with the need for a single initial seed, for example.

The reason is simple: We want those per test-class lifetime, but JUnit allocates a new class instance for each test method. And lot's of Lucene tests use @BeforeClass to produce indexes (random) static indexes, then used by all test methods in a read-only way. Currently we have 3 seeds, one for class-level stuff, one for instance stuff and a third one for the runner (according to Mister @rmuir: LUCENE-3362).

The randoms must therefore be static and initialized in @BeforeClass.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

The need for the runner seed is explained in #4435. One problem is the "Test lifecyle" of junit is ill-defined, it depends on how you are running tests!

The problem is that via ant, tests work like this (e.g. for 3 test classes):
computeTestMethods
beforeClass
afterClass
computeTestMethods
beforeClass
AfterClass
computeTestMethods
beforeClass
afterClass

but via an IDE or maven, if you run it from a folder like you did, then it does this:
computeTestMethods
computeTestMethods
computeTestMethods
beforeClass
afterClass
beforeClass
afterClass
beforeClass
afterClass

asfimport commented 13 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Thanks. I did go through the code, so I know where the seeds are used and had a pretty much good understanding as to why.

As for the different lifecycle - this is weird, but isn't it a direct consequence of subclassing BlockJUnit4ClassRunner and relying on what it internally does? This is what's causing the problem (superclass impl. changing over time - I think you just hit two different junit versions in that issue).

Perhaps I was wrong and a custom runner is indeed needed, but if so then I still think a single seed (logically) would be fine. A custom runner then:

collects methods to be executed (per-class)
initializes the global init seed/ random. This random becomes the initial randomness source for everything that follows to make it repeatable.
shuffles methods,
execute any @BeforeClass rules (see note below),
for each selected method (-Dtestmethod limits the selection and acts as a filter), repeat test.iter-times (seed changes predictably): {initialize per-method starting seed based on the current random, create a new test instance, execute}.
execute any @AfterClass rules

The question how to randomize class-level fixtures could be answered by a static utility method that would return the per-class seed using ThreadLocal or a thread map updated by the runner. Still predictable and repeatable.

I'll chew a bit on the possibilities and report back tomorrow.

asfimport commented 13 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Re-reading the above algorithm I think I'll make it clearer: my point is that you can write repeatable runner by starting from a single initial seed and assigning initial seeds to all execution start points (tests) regardless of whether they are executed or not (and how many times). Hope I'm a bit clear(er) now.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

This is what's causing the problem (superclass impl. changing over time - I think you just hit two different junit versions in that issue).

I disagree. I used the same junit version (4.7) myself in both eclipse and via ant to deal with this problem. It has nothing to do with that.

The junit test lifecycle is really undefined just as I described, its unfortunate.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

And just so you know, its not possible i could have used 4.8 here, because all of our tests fail with 4.8

Thats because of breaks in the lifecycle of TestWatchMan (Its initialized before the @Before's in 4.8, not in 4.7). A separate problem, but just something to mention. currently you cannot use junit 4.8 with lucene's tests for this reason.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

One last thing, thinking thru the simplifications Dawid is looking at doing, and knowing how horrible the code currently is, we could consider trying some things like:

upgrade/fix our tests to work with latest junit? maybe there are less frustrations
contribute some of the more general things like assume(String message, xxx) to junit to get them out of our codebase?

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

attached is a patch generalizing the UseNoExpensiveMemory annotation to @AvoidCodecs that takes a list of codecs to avoid.

This way, tests that cannot work with Lucene3x codec can just avoid it, using another codec, rather than assuming (in general its bad that many of the tests of actual new functionality often dont run at all because of the current assumes)

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I like the annotation. Can we maybe change it to look like @SuppressWarnings? So it does not need codecs={} or if there is only one codec, no {} at all? Should be not too hard?

Otherwise strong +1!

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

It's easy, just rename codecs to "String[] value" and you are done. After that you can use @AvoidCodecs("SimpleText") or @AvoidCodecs({"SimpleText","Lucene3x"})

See: http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/SuppressWarnings.html

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I agree, this is the main problem with the current patch. We should fix this before committing.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

updated patch using value[], much less wordy.

I will commit soon.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

ok one last change, renamed to SuppressCodecs (it actually is not just funny, but better since it works the same way etc)

asfimport commented 11 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

Bulk move 4.4 issues to 4.5 and 5.0

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Move issue to Lucene 4.9.

apache / lucene

Refactor test classes that use assumeFalse(codec != SimpleText, Memory) to use new annotation and move the expensive methods to separate classes [LUCENE-3489] #4563

4537

4502

4566