Pre-SIP: Spin App Testing Framework

The following is an exploration pre-SIP on testing Spin applications.

High-Level Needs

At a high-level any testing mechanism for Spin needs the following:

Test Definitions: Users might be able to define the tests that they want to run. This means allowing for the following:
- Test initialization: doing some amount of work before the test begins
- Input initialization: initializing the inputs to the trigger invocation (e.g., creating the HTTP request that will be fed to the HTTP trigger)
- HostComponent mock configuration: customizing the behavior of HostComponent mocks so that they return customized values when invoked.
- Assertions: asserting that the values into and out of HostComponents match expected values and that the return value from the trigger matches expected values.
- Test deinitialization: we may also wish to provide some way for the user to invoke some functionality after a test has run. This might be especially important when dealing with actual external resources (e.g., databases) that might require some sort of cleaning up.
Trigger Invocation: There needs to be some way to invoke a trigger handler in memory instead of going through the full end to end trigger flow. For example, for the HTTP trigger there should be some way to generate a Request value that gets fed into the trigger rather than having Spin start up an actual web server and the trigger being invoked through an actual HTTP request that flows through the entire network stack.
HostComponent Mocks: Instead of requiring that the HostComponent implementations interact with real external resources (e.g., the OutboundRedisComponent communicating with an actual Redis instance), we would provide some ability to mock that functionality. This would likely need to be optional as some users may wish for their tests to interact with actual external resources.
Test Runner: For a collection of test definitions, the user should be able to run those tests and receive well formatted output on whether the test passed or not.

Nice to Haves

In addition to the high-level needs above, the test trigger would preferably have the following functionality:

The ability to write test definitions in the user's language of choice
Full customization of HostComponent mocks: i.e., if the user wants to do something really different they can provide their own HostComponent mock component instead of relying on the customization knobs and levers we provide.

Possible Implementation Idea

The following is a straw-man proposal for how we might be able to support testing. This implementation is an attempt to give the user the ability to write test suites orchestrators as Wasm components.

Test definition component

Users write each test definition as a component that has the following shape:

world http-trigger-test {
  use types.{test-error};

  // The test can call this function to invoke the http trigger
    // A noop if called outside of a test
    import wasi:http/outbound-handler

    // The test suite that will be run by the test runner
  export test-suite;
}

interface test-suite {
  use wasi:http/types.{incoming-response};
  use types.{test-error};

  // Any set up that needs to be run before any tests are performed
  set-up: func() -> result<_, test-error>;

  // Get the input for the next test
  //
  // Returns `none` when there is not other test to run
  next-test: func() -> option<test>;

  // Peform any clean up that might be needed after all tests have run
  tear-down: func() -> result<_, test-error>
}

resource test {
  use wasi:http/types.{outgoing-request, incoming-response};
  use types.{test-error};

    // An identifier for the test used for things like filtering
    name: func() -> string;

  // Run the test and return an error if it fails
  run: func() -> result<_, test-error>;
}

// Various types
interface types {
  // A test has errored
  record test-error {
    message: string
    span: span
  }

  // The source location where the test has errored
  record span {
    file: option<string>,
    line: option<string>,
    column: option<string>,
  }
}

Each test suite is run against a given Spin application.

This provides the user with the ability to:

run initialization code that is shared across multiple tests
initialize individual tests
initialize the trigger input for each individual test
make assertions on the return value from the trigger invocation
run deinitialization code after the test has been run.
run deinitialization code after the entire test suite has run

We could also provide language SDKs for writing tests that handles some of this boilerplate for you, but as with Spin components, the SDK would be optional.

Test Isolation

For each test-suite the Spin app under test and the test definition component only need to be compiled once. On each test invocation, a new store and instance will be used for the Spin app invocation ensuring isolation between tests.

Wasi

The test-suite has no access to the host system through wasi. As usecases for interacting with the host system become more clear, we may want to reconsider this restriction.

Test definition manifest

The test definition component is not sufficient to fully define a test. A test definition manifest must also be provided. We leave the exact schema of the manifest up for future bike-shedding, but it would include the following information:

Path to the Spin.toml of the Spin app being tested
The path to the test definition component
Optional customization of HostComponent mocks. This happens in one of two ways:
- Static customization through the manifest. For example, a possible customization of the key-value interface could be:
```
# The key/value pairs the key-value store will be initialized with
[[existing-keys]]
foo = "bar"
# The writes that the key-value store is expected to see
# If a different set of writes happens, a test failure is raised.
[[expected-writes]]
baz = "qux"
```
- A paths to HostComponent component implementations - more on this below.

Custom Component `HostComponent`s

As stated above, the user may optionally provide paths to Wasm components that act as completely custom HostComponent implementations. These components export the interfaces they are mocking and are used by the test runner as the host implementation for the given mocked Spin interface.

The wit for such a component would look like this:

world host-component {
  use types.{config, config-error};
  // One or more mocked interfaces
  export fermyon:spin/llm;

  // Configuration for this specific `host-component`
  export configuration: func(config: toml-table) -> result<_, config-error>;

  // Life-cycle hook called when the test is over
  export: func test-begun();

  // Life-cycle hook called when the test is over
  export: func test-ended();

  // Allows the `host-component` to fail the test if some assertion is not met.
  import fail(test-error: test-error);
}

interface types {
  variant toml-table {
    // TODO: this should be some loosely typed structured data that
    // is passed directly from the test definition manifest to the host-component
    // The host-component parses this config and configures itself based
    // on the data passed.
  }

  variant config-error {
    invalid-config
    // TODO: bikeshed what this error type looks like
  }
}

The configuration export allows the component to be configured based on data from the test definition manifest. The shape of this configuration is specified by the host-component component. This functionality users can provide generic mocks that can be shared with the entire Spin community which should hopefully make test writting even easier.

Built-in HostComponent mocks can be built in exactly the same way as these custom ones.

Test Runner

The test runner could simply be a spin test command that would look for a directory of test definition manifests and run them. The test runner would read the test definition manifest and load the test component, the Spin application and configure the Spin runtime to use the HostComponents as defined in the manifest file.

If the test component returns ok and none of the HostComponents invoke the fail import, the test passes. Otherwise, the failure message is displayed to the user.

In the future, we may want that the test runner itself can use a component for handling test suite results. For example, the test runner can invoke the stdout test runner output component for printing results to stdout, or it can invoke the JUnit test runner output component for logging results to JUnit compatible files. The community may wish to provide different implementations for their needs.

Other Thoughts

We have been talking about renaming HostComponent for a long time since many find this term confusing. Since we will now need for the user to think about the concept of "functionality that the Spin host provides" so that they can create assertions and configure mocks, we may wish to find a term that will be clearer and more immeadiately understandable.
Good error messages are hard - we may want to be a bit more prescriptive with the fail import function signature to take more than just a message but potentially also a source-location value that desciribes the source location where the assertion failure happened.
Much of this functionality is not Spin specific. Potentially over time, we can factor much of this out into a generic component testing framework.
While the above has examined component testing (i.e., testing of a component that has no explicit knowledge that it is under test), there is a lot of overlap with conformance testing where the component being invoked is testing the HostComponents. This is essentially the inverse of what this document examines: testing the runtime vs. testing a component. We're likely able to share a lot between these two types of tests so we should always keep in mind that conformance testing is also something we want to be able to do in the future.

fermyon / spin