RFC attestation of test validity?

MichaelAtSamraksh commented 9 years ago

Call me paranoid, but I don't trust that functionality under test is correct even though the test indicates that it passed. the TestSystem throws errors for compilation problems, but it's up to the test writer to determine and implement the pass/fail criteria. What is the SOP for a developer denoting that, on a given date, a test in the TestSuite is good and actually does what it is supposed to do?

ChrisAtSamraksh commented 9 years ago

There is an art to writing tests to be sure. Written incorrectly, it can easily give false assurances.

During the testing of the test, the test writer needs to purposely break the code (I generally put in an infinitely loop in tinyhal.cpp) and run the test to verify that it does indeed fail.

Also, there can be problems with longevity. Currently our virtual timers all run correctly and their tests were passing but they weren't testing for very long periods of time. A test was written to catch this type of error, but if a problem takes hours to manifest and we have a number of tests that take 6 hours each or more to run, then a user can run tests all weekend and not finish. I'm working on a way to filter out and run separately tests that take many hours to run, as I think that will interfere with the development process if engineers can't test all major components in a set of tests overnight.

We also have some DataStore tests that pass sometimes, but not always. This indicates a problem is not tested thoroughly enough to definitively give a valid pass. I think this is the sort of problem you ask about. This is where the art of writing the test comes in and it comes down to the test writers.

This can be an iterative process at times. If a new bug is found and fixed, a new test should be written to test for this bug. This test should fail when the bug is present and pass when fixed in the code.

MichaelAtSamraksh commented 9 years ago

Test duration

it is good that we are moving toward more automated tests that could be run overnight.
for parallel testing,
- we have extra hardware boards
- inexpensive logic clones are available
- logic capture is limited by available ram, but Saleae is working on lower capture rates Original point
We should define a standard comment that developers should place somewhere in each test that indicates: for this developer, on this date, on this hardware config, this test was checked for correctness against current API, against false negatives, against expected output, against current TestSystem XML schema version, and found to be a working valid test.

p.s.~ I translated Chris's post into the "Writing Managed Code Tests.docx" in TestSystem/Documentation.

AnanthAtSamraksh commented 9 years ago

Many times, I have seen cases where tests that pass on my machine will not work on Chris' machine and vice versa. Before release, there should be atleast 2 machines where all test cases pass.

WilliamAtSamraksh commented 9 years ago

@AnanthAtSamraksh: Does this reflect different PCs or different .NOWs or something else? If we don't know why it's happening, the number 2 is kind of arbitrary.

AnanthAtSamraksh commented 9 years ago

I meant different PCs.

WilliamAtSamraksh commented 9 years ago

@MichaelAtSamraksh: I agree that we need to know the history of each test. I think we need something both more automated and more structured. Instead of a TestReceipts repo, keeping results in a database makes sense. It would be updated by the test system; developers wouldn't have keep up with comments. And reports could be run, focusing on test results at some point in time and on test result history for a given test.

Finally, versioning is important here as well. If a test has to be updated for any reason, it's version number should be increased and the test system should include that in the test results.

WilliamAtSamraksh commented 9 years ago

@AnanthAtSamraksh: do you know why it might differ by PC?

MichaelAtSamraksh commented 9 years ago

I just want to know the last time that a test had a quality control inspection. Running a test is a huge waste of time if it falsely passes.

WilliamAtSamraksh commented 9 years ago

Nothing beats code inspection combined with the injection of faults designed to corrupt the test itself. If, as for the COM1 test, inputs and outputs have to be the same, a test fault could randomly change the output.

I think it would be useful to have a standard set of conditional compilation symbols such as #TESTFAULT01 thru 09, say, that could be enabled by an option in TestRig. The test could use those symbols to conditionally include code designed to break the test.

Another QC validation is to run a test against an eMote build that's known to fail the test. If we had a suitable history, we'd know when that happened and we could replay it to verify that it fails.

ChrisAtSamraksh commented 9 years ago

I have purposely broke all tests last week and verified they all fail, so the currently checked in tests did not give false passes. It is on my list to do that test again because a lot tests have changed and a lot of MF code has changed recently.

As far as tests passing on one machine and passing on another, it could be because different versions of codesourcery were being used at one point and there were some hardcoded paths in the test project file.

Hardware differences also could cause problems (such as the COM2 jumpers need to be removed in order for the COM2 test to pass).

Samraksh / TestSystem

RFC attestation of test validity? #4