NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As a developer, I want better system test coverage for combinations of projects, without cleaning #199

Open epag opened 3 weeks ago

epag commented 3 weeks ago

Author Name: James (James) Original Redmine Issue: 60317, https://vlab.noaa.gov/redmine/issues/60317 Original Date: 2019-02-19


Given a desire to test as many operational situations as practicable before a candidate is released, when I develop system tests to implement that goal, then I want to consider more situations in which the database has not been cleaned.

I seem to be unique in executing the WRES without cleaning regularly. I have seen many failures (generally in the execution of scripts) under these circumstances that are only solved by cleaning the database.

I understand that, when I see a specific failure, I should open a ticket and try to document the reproducible conditions. However, in reality, I don't always do this, because I'm in the middle of something else and these types of failures involve interactions that are hard to reproduce (sorry, an admission of guilt).

As a starting point, I would like to see us implement one sequence of system tests in which we execute all tests without cleaning. Another possibility is that we randomize the order in which this sequence of tests is executed (and obviously document the selected order to reproduce). Obviously, tests should be deterministic, in general, not random, but the individual tests will remain deterministic, and the collection of tests as a whole will remain deterministic, only the sequencing will change (much like renaming a JUnit test method). The aim here is to increase confidence over the long-run. Other suggestions are very welcome.


Related issue(s): #187 Redmine related issue(s): 51655, 69087


epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-02-19T12:12:35Z


James,

I'm fine with what you propose.

Removing the cleaning is easy: just get rid of the 0-byte CLEAN files.

As for randomizing the order, JUnit should help to facilitate this, right? Looking at the website,

https://github.com/junit-team/junit4/wiki/test-execution-order

Does one of the ordering schemes mentioned represent what you want? I think truly random would be, @FixMethodOrder(MethodSorters.JVM), while the default ordering with 4.11 (would need to confirm version) is random but consistent, which I think is what this means: "deterministic, but not predictable".

Thanks,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-02-19T12:27:28Z


Hank,

To be clear (and I don't think you are necessarily saying this either), I don't want to replace our existing system test sequence, but add one. We may want to aggregate the reporting via e-mail, however.

Agreed, we have options available in JUnit. The default sequencing is the method hash, but the sequencing can be explicit, as you say. In general, I think there's very little need for this functionality, because unit test outputs should not be dependent on execution order, in general (if they are, something probably smells), but I do think it's relevant here.

James

epag commented 3 weeks ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2019-02-19T13:28:20Z


I don't know if this helps, but my current test scripts go from the scenario0 series up through the scenario1** without cleaning, cleans, from the scenario1 series to the scenario0** series without cleaning, then cleaning and running each test in between three times. I have yet to encounter any issues specific to clean vs "uncleaned" databases (probably need a better term for that). If a test returns the wrong result, it will generate the wrong result in both cases.

These issues generally arise when there is a change in storage. If we run a series of tests on a clean database, then run a series of tests on an "unclean" database (we probably need a better term), these issues won't arise. As a result, cleaning every once in a while is strongly recommended; the data in your database really shouldn't be kept for long periods of time.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2019-02-21T22:37:42Z


My comment from the meeting which I repeat here:

The order should be pseudo-random not truly random so that we can make comparisons between sets of runs. Furthermore, the pseudo-randomly ordered runs should be stopped when a new commit comes in, and the next set of runs run on the new commit using the same pseudo-randomly-changing order.

Abbreviate example with four scenarios:

  1. Push with version A detected: kick off pseudo-random tests using seed "1" with version A
  2. First test order is scenario4 scenario2 scenario1 scenario3, begins run with version A
  3. First test set finishes with version A
  4. Second test order is scenario3 scenario1 scenario2 scenario4, begins run with version A
  5. Push detected: cancel second run with version A, kick off pseudo-random tests using seed "1" with version B
  6. First test order is scenario4 scenario2 scenario1 scenario3, begins run with version B
  7. First test set finishes with version B
  8. Second test order is scenario3 scenario1 scenario2 scenario4, begins run with version B
  9. Second test set finishes with version B
  10. Third test order is scenario2 scenario3 scenario1 scenario4, begins run with version B ...

Something like that.