ben-albrecht commented 5 years ago

Unit Test Straw Proposal

As a follow-on to https://github.com/chapel-lang/chapel/issues/9730 and a recent test framework deep dive, here is a strawperson design proposal for Chapel's test framework.

Proposal: Go-inspired test interface, Cargo-inspired test launcher

Provide a Test data type, which allows one to set test metadata and provides test helper methods
Use argument type in function signature to distinguish tests for functions/methods: proc f(t: Test)
Have a separate external test launcher run the test program as a subprocess, to handle halts, multilocality, and other features

Why argument-based test annotations?

Argument-based test annotations require no new language features
Alternatives would be language-supported annotations:
- Chapel does not have a user-facing language feature for pragmas today.
  - It may one day, but this is low priority currently, and would require significant design work to introduce.
- Adding a test-specific language feature is possible, e.g. test proc foo()
  - It might be odd to support a language feature specifically for 1 module
  - The language feature could be pretty complex (need to specify abitrary metadata)

Why have an external test launcher?

Enables resiliency to halts and compiler errors
Wraps the test-specific command line args, enabling a cleaner CLI

Example 1: Basic Usage

Source code:

use UnitTest;

// Some library function to be tested
proc inc(x) {
  return x + 1;
}

// test defined by argument
proc testInc(test: Test) {
  test.assert(inc(3) == 5);
}

Running through test launcher:

# Compiles and runs with the test flags that run UnitTest.main()
mason test testProgram.chpl

=========================== test session starts ============================
collected 1 test
================================= FAILURES =================================
_______________________________ testProgram ________________________________

proc testInc(test: Test) {
>       test.assert(inc(3) == 5);
E       test.assert(4 == 5);

testProgram.chpl:<line number> AssertionError
========================= 1 failed in 0.12 seconds =========================

Note: This output is just a mockup inspired by pytest output and is subject to change. The assertion introspection would require some fancy compiler features, and is not necessary.

Compiling and running the test program with test flags enabled outside of mason is a topic outside of the scope of this proposal.

Implementation Details

For the purpose of demonstrating the division between library and compiler implementation, below is sketch of what the UnitTest.main may look like:

module UnitTest {
  proc main() {
    use Reflection;

    // Assuming 1 global test suite for now
    // Per-module or per-class is possible too
    var testSuite = new TestSuite();

    // Go find all the tests with `Test` as an argument
    for t in __primitive('gatherTests') { // <--- compiler features needed
      testSuite.add(t);
    }

    for test in testSuite {
      try {
        // Create a test object per test
        var testObject = new Test();

        // Could filter some tests out here

        // option 1: if test is a FCF:
        test(testObject);
        // option 2: if test is a string:
        Reflection.call(test, testObject); // <--- compiler features needed
        // other options exist as well
      }
      // A variety of catch statements will handle errors thrown
      catch e: TestSkipped {
        // Print info on test skipped
      }
      catch e: TestDependencyNotMet {
        // Pop test out of array and append to end
      }
      catch ... { }
    }

  class Test {
    proc assert() { } // <--- compiler features wanted (but not needed)
    proc skipIf() { }
    proc dependsOn() { }
    proc fail() { }
    // ...

    // Could store pass/fail, test name, performance timings, dependency informations, etc.
  }

  class TestSuite() {
    // Stores array of functions to be run
  }

Notable compiler features required for this are:

Reflection.call()
- Call a procedure (or eventually method) with a set of arguments
__primitive('gatherTests')
- This is a compiler feature specific to UnitTest implementation
- Will need to ignore module scoping
- Return an array of functions (strings? FCFs?) that have the signature of f(test: Test) throws { }
- We could optionally support a generalized form of this in the Reflection module that finds all functions of a specific signature
Assertion introspection
- The ability to print the expression passed to the assert function, to give more informative failure messages
  - This could be implemented for Assert module asserts or UnitTest-specific asserts.

e.g.

// Assertion introspection example
assert(x + 2 > 3);
// prints:
//  assertion failed:
//    assert(x + 2 > 3)
//    assert(2 > 3)

Some other considerations:

Test is a class so that users could potentially extend it.
There are a number of ways gatherTests could work beyond this example.
Modules to be tested could be given through a command line argument from test launcher.
User code only needs to use UnitTest to access the Test type in their test signatures.

The test launcher is an entirely separate program that will run the test program and parse stdout/stderr for information that it can print to user and use as feedback for the test loop. This proposal suggests we implement this test launcher within mason test.

The information parsed would include:

What test is currently running
The outcome of the test (pass? fail? skip? dependency not met? incorrect number of locales?)
- If halt, launch the test program again with the test that halted and all the tests already run disabled
- If incorrect number of locales, queue a rerun with the required number of locales for the tests that require it.

Example 2: Test Metadata

use UnitTest;

proc s1(test: Test) throws {
}

proc s2(test: Test) throws {
  test.skipIf(here.maxTaskPar < 2);          // throws "TestSkipped"
  test.dependsOn('s1');                      // throws "TestDependencyNotMet"
}

# Compiles and runs
mason test testProgram.chpl

=========================== test session starts ============================
collected 2 tests
========================= 2 passed in 0.09 seconds =========================

# If the skipIf was true:

mason test testProgram.chpl

=========================== test session starts ============================
collected 2 tests

testProgram.chpl s2(test: Test) SKIPPED
========================= 1 passed, 1 skipped in 0.09 seconds ==============

Some corner cases to consider:

A test with N locales cannot depend on a test with M locales, where N != M.
- Ideally, we'd print a warning or error about this early on.
A test that depends on another test that failed or halts will be skipped
- Ideally, we'd print a warning about this when the test halted/failed.

Example 3: Multilocality

use UnitTest;

// This test requires 8 locales
proc s1(test: Test) throws {
  test.numLocales(8);
}

// This test can run with 2-4 locales
proc s2(test: Test) throws {
  test.maxLocales(4);
  test.minLocales(2);
}

// This test can run with 8 or 16 locales
proc s3(test: Test) throws {
  test.numLocales(8);
  test.numLocales(16);
}

// This test require 8 and 16 locales, so we wrap it with s4 and s5
proc t(test: Test) throws {
  test.ignore(); // UnitTest.main does not consider this as a test
}

proc s4(test: Test) throws {
  test.numLocales(8);
  t(test);
}

proc s5(test: Test) throws {
  test.numLocales(16);
  t(test);
}

When the test launcher encounters an TestIncorrectNumLocales error, it will queue up another run with the correct number of locales (accounting for tests that have already been run).

Specifying multiple numLocales could be an error, or treated as a logical "or", i.e. run with either configurations, like the example above suggests.

Other Features

Some features not demonstrated in this proposal:

Support class/record methods
- This proposal only discusses writing tests as functions. It would be nice to support class and record methods eventually, especially for test suite organization. However, I consider this a separable feature extension that can be tackled in the future.
Test suites
- Users should be able to organize tests into suites and explicitly run with them enabled/disabled.
- This will be handled at the module-level and eventually at the class-level too
Test parameterization
- Testing a function with multiple sets of arguments is a common use-case.
- We might consider ways to improve looping over a set of arguments/results.
Handling compiler errors
- May require a mechanism of compiling with specific tests explicitly pruned
Testing parts of a private module
- May require some magic function in standard library that allows accessing private module.
Parallel testing
- i.e. running multiple tests in parallel, rather than running parallel test programs
Distributed testing
- i.e. running multiple tests across a distributed system, rather than running distributed test programs
Running without a test launcher
- Would it ever make sense to do this?
- What would this interface look like for a user?

bradcray commented 5 years ago

Some random reactions (a few of which reflect comments from others that resonated with me):

[Disclaimer: I've never worked in a language/framework/team that's made use of unit testing, so am pretty ignorant here.]

Due to the importance of testing and the compiler's involvement, I could imagine this becoming a language feature over time if/when we gained confidence in our direction, but I think that starting with a no-language-change approach is prudent in the meantime.
I worry a little bit about the approach for specifying numLocales as (to my understanding) it feels a little passive/reactive/roundabout ("You called me and BTW I need this many locales, so maybe you should run me again with more if you didn't have that many") rather than proactive ("As this test's / package's authors, I think these tests need to be run with this many locales, so please do that.")
Similarly, the dependence-between-tests approach feels a little weak to me (e.g., the use of a string to name the test routine when we could just refer to the routine directly; a question in my mind about whether the one test should just call the other if there's a dependence between them rather than creating a constraint graph for the test harness to solve.
It seems to me that current FCF support could plausibly be sufficient for what we need here.
While I think making sure this is well-integrated with mason is a "must", my intuition is that there ought to be a way to invoke the testing outside of mason as well (?). Put another way, it's clear to me that mason ought to have a path to running unit tests in code; it's not clear to me why mason would be necessary to run unit tests in code.
I think the biggest thing that worried me when Ben was walking us through this was the statement that the unit test framework wouldn't be able to use the user's modules, so would need some special way to get access to their namespaces... This set off alarms, though I understand the challenge (my head goes to "Maybe the UnitTest.chpl module could / should be dynamically generated or modified by the compiler / mason when building unit tests?"

mppf commented 5 years ago

I think the biggest thing that worried me when Ben was walking us through this was the statement that the unit test framework wouldn't be able to use the user's modules, so would need some special way to get access to their namespaces... This set off alarms, though I understand the challenge (my head goes to "Maybe the UnitTest.chpl module could / should be dynamically generated or modified by the compiler / mason when building unit tests?"

Yep, it'd be possible for example for buildDefaultFunctions (or some other phase in the compiler) to add to module init code some code to populate a global list of test FCFs with any test function in that module. I don't think this changes very much what the test launcher / test library needs though (and as a result havn't considered it a key part of the above proposal for general direction).

I.e. the compiler could generate per-module code like this:

use UnitTest;

// User-written tests
proc t1(test: Test) { ... }
proc t2(test: Test) { ... }

// Generated code at the end of module-init
UnitTest.registerTest(t1);
UnitTest.registerTest(t2);

UnitTest.registerTest would just create a global list of test FCFs that could be run later.

vasslitvinov commented 5 years ago

If we are to have a collection of testable things, as testSuite above, it is best built by the compiler during compilation. Think of the Locales array: Chapel code sees is as an already-populated array. Similarly, the compiler can emit code to build it, then the testing framework can access testSuite with all elements already there. Its size can probably be a compile-time constant.

lydia-duncan commented 5 years ago

In the meeting with Krishna today, I mentioned that I thought it would be helpful to have a flag that users could use when running the launcher to say not to try tests again with a different number of locales than what they sent to the program.

Normally, when a test that specifies 4 locales gets attempted with 8 due to the launcher's argument, we would give an error about the incorrect number of locales, and then run the test again with the correct number. With this flag that I am suggesting, the user would be able to say "no, just tell me if the test can't run with that number of locales and don't do anything else".

krishnadey30 commented 5 years ago

I have implemented the depends on functionality and wanted to know what should be the done when a test depends on other test and have different requirement of numLocales?


proc s1(test: Test) throws {
  test.addNumLocales(6);
}

proc s2(test: Test) throws {
  test.dependsOn(s1);
  test.addNumLocales(16);
}

krishnadey30 commented 5 years ago

@bradcray Thanks for the comments. I tried implementing dependsOn method and was successful in doing so. What I think directly calling one test from another can lead to some issues. For example how to handle a case when we these two tests have mismatched numLocales which I mentioned in my previous comment. This can lead to a infinite loop and I was not able to find a solution for this. Waiting for feedback. Thanks.

ben-albrecht commented 5 years ago

In the future, we may consider adding test metadata such as dependencies to entire test suites, which will be in the form of modules or classes/records. This approach will have the advantage of being defined separately from the test functions themselves such that the test launcher can process the dependencies, number of locales, etc. before ever running a test function.

bradcray commented 5 years ago

@bradcray Thanks for the comments. I tried implementing dependsOn method and was successful in doing so. What I think directly calling one test from another can lead to some issues. For example how to handle a case when we these two tests have mismatched numLocales which I mentioned in my previous comment. This can lead to a infinite loop and I was not able to find a solution for this. Waiting for feedback. Thanks.

At a glance, this looks related to my concern about the passivity of how the number of locales is specified currently. I don't have any specific suggestions myself at this moment, though I like the flavor of what @ben-albrecht suggests just above.

krishnadey30 commented 5 years ago

Test parameterization will be very useful for the users. There are two approaches which I can think of for implementing it:

Creating a magic function in the compiler which will find out all the parameters in which the given test function should be run and then running the function for each parameter.
Creating a new method in Test Class which will take all the parameters and returns one parameter from these and mark the parameter done. Now we have to run this test function until all the parameters are marked done and we store the result accordingly.

Until this feature is implemented user can just loop through these parameters and run the test function with each of them.

ben-albrecht commented 5 years ago

A TODO for @krishnadey30 && @ben-albrecht - Split off remaining features to be implemented into standalone feature request issues, and eventually close this issue.

Some examples:

Module-base test suites
Class-based test suites
test-suite-wide metadata (suggested here)
Introspective assert (such that assert*() helper functions are not needed)
Handling compiler errors
Emit line numbers in assertion errors

russel commented 4 years ago

Just wondering if people have thought about property-based testing rather than example-based testing. unit_threaded allows for check in D. proptest and quickcheck are available for Rust. testing/quick is available for Go. Quickcheck in Haskell is the place where all this started. Hypothesis is the Python way of doing the right thing.

lydia-duncan commented 4 years ago

I hadn't, but it seems worth a separate issue

ben-albrecht commented 4 years ago

Agreed. @russel - would you be interested in opening an issue on that topic as a place for further discussion?

We intend to close/archive this issue, and spin off future unit test design discussions into separate issues.

krishnadey30 commented 4 years ago

I think it will be a good feature. We generally test based on the assumptions which we made while writing the code and the tests sometimes miss the bug. Having a feature testing can solve this issue to some extent.

russel commented 4 years ago

@ben-albrecht I cheated and just did a cut and paste to a new issue, #15488 . :-)

DanilaFe commented 4 months ago

I believe that the proposal as described in this issue is present today, in some form, in the UnitTest module. On main: https://github.com/chapel-lang/chapel/blob/main/modules/packages/UnitTest.chpl. I am seeing most of the features described in the issue opening, including the locality features.

mppf commented 4 months ago

Yes I think this issue can be closed

chapel-lang / chapel