Test runner wishlist - Githubissues

osa1 commented 4 years ago

This is a list of features/changes I'd like to see in the test runner (test/run.sh and test/Makefile) that would make my job much easier.

After running the tests print a summary of failing tests. Something like

Failing tests:
- shared-object [drun-run] [ic-ref-run]
- another-failing-test [tc]

Allow running tests in parallel while still showing outputs of failing tests, and the summary as described above. (Note: make quick isn't this)
Implement a flag to show how to run a test without using run.sh. For example, there are a lot of tests that can't be compiled simply by calling moc <file path> as the files need pre-processing. Similarly I can't run any of the drun tests with drun.

Just one idea to make this simpler: implement "prepare" scripts/programs that takes a test path as argument, and generates pre-processed files in a new directory.
Test suite should run all tests even if the tests in a subdirectory fails. Currently if a test in a directory, say run-drun, fails, rest of the tests in the same directory run, but the script stops after that directory. Instead it should (be able to) collect all failures.

(more to come)

These features would be very useful when debugging. For example, (1) is useful when I try to understand effect of a change. I make a change and see if it makes things better or worse, and which tests it fixes or breaks. That helps me understand what the change effects. (2) saves time. (3) allows me to run a test with modified dependencies (e.g. wasmtime, drun or ic-ref).

I think one prerequisite for this stuff is probably replacing the shell script with a more maintainable language with good library support.

chenyan-dfinity commented 4 years ago

I plan on trying https://github.com/rust-shell-script/rust_cmd_lib for Candid tests. Everything inside the macro is still shell script, and you can assign the result to a Rust variable as needed. Seems easy to replace shell script.

osa1 commented 4 years ago

Yeah I think I recently saw that library on /r/rust. Let us know how it goes, perhaps we can use it to replace our run.sh.

nomeata commented 4 years ago

I’d like to add

a mode to run only the previously failing tests, very useful for iterating on tedious fixes/updates.

Since tasty brings nice UI for A, B and C, I’d be inclined to re-implement run.sh in Haskell with tasty. But probably everybody has their own favorite.

The question is: Is the pain already big enough? Do we want to shave that yak now?

chenyan-dfinity commented 4 years ago

a mode to run only the previously failing tests, very useful for iterating on tedious fixes/updates.

That's unsound. A fix can make previously passing tests to fail. We want a dependency graph :)

nomeata commented 4 years ago

That's unsound. A fix can make previously passing tests to fail. We want a dependency graph :)

tasty --rerun works like this:

If the previous run had failing tests, re-run those. This is usually what you want to do: Some tests fails, you fix them.
If the prevoius run had no failing tests, run all again. So you just keep running tasty --rerun until you see no failing tests, and then once more.

Once you have tasted the convenience benefit of “rerun the failing tests”, you don’t want to miss it any more :-)

nomeata commented 4 years ago

In https://github.com/dfinity/motoko/pull/2133#discussion_r526788064 @kritzcreek suggests https://github.com/mirage/alcotest. So if we want to keep using Ocaml (to avoid too much language proliferation) we could rewrite run.sh in ocaml based on aloctest and get many of the requested gimmicks.

osa1 commented 3 years ago

I'd like to start porting run.sh and perl scripts (and maybe some of the Makefiles as needed) to OCaml. I'm looking for a testing framework/library that allows or maybe even implements some of these:

It should be possible to implement multiple variants of a test, and filter on variants. In the test I should be able to run differently depending on the current variant. Use case: I want to run a test with both drun and ic-ref, with and without sanity check flags, GC scheduling flags etc. those would be different variants and pass different flags to moc (e.g. --sanity-checks, --force-gc).

This can be implement by adding prefix/suffix to tests so perhaps no features required from the test library.
It should be possible to run tests in parallel but still get good output (ideally variants should also run in parallel).
Failing tests should be listed at the end of the run. I shouldn't have to scroll back and forth and collect failing tests manually, or have to pipe the output to a file and then filter.
If a test/variant is skipped for any reason test runner should say why it was skipped (so many test runners make this mistake... it's very frustrating to see that a test is skipped but you have no idea why)
Also the stuff listed in my original message above.
(Any good properties/features that we want to maintain? For example, I think it makes sense to declare tests in files instead of having a test definitions in test runner code, and having all flags or other configurations in the same file as comments)
(anything else?)

@ggreif @claudio @kritzcreek any other features you'd like to see in a test runner? @kritzcreek do you know if alcotest makes these possible?

I implemented a test runner in Rust before using only std and a CLI argument parser and it wasn't too difficult. OCaml's standard library is a bit lacking comapred to Rust's, but I think it should still be possible to implement our own test runner from scratch if needed. Of course it probably won't have fancy colorful and tabulated output as alcotest, but I personally don't care about pretty output too much.

osa1 commented 3 years ago

I've started replacing run.sh with OCaml in #2684.

dfinity / motoko

Test runner wishlist #1869