Closed denismerigoux closed 2 months ago
As you can see in the last chunk of code you linked, the way to report the status is quite ugly: since running the tests is handled by ninja
, we just use a special rule at the end for reporting the results. At the moment, this rule is just a short shell snippet ; this can be seen in _build/clerk.ninja
after running clerk in debug mode:
rule test-results
command = out=${out} ; success=$$( tr -cd 0 < ${in} | wc -c ) ; total=$$( wc -c < ${in} ) ; pass=$$( ) ; if test "$$success" -eq "$$total" ; then printf "\n[PASS]$ %s:$ %3d/%d\n" $${out%@test} $$success $$total ; else printf "\n[FAIL]$ %s:$ %3d/%d\n" $${out%@test} $$success $$total ; return 1 ; fi
description = <test> ${out}
⇒ A better way to handle this would be to implement a clerk report
internal subcommand (we already have clerk runtest
in this category) that could do that more cleanly with OCaml code, and gets called by this rule.
We don't have at the moment the information about how many tests there were in each file though: testing proceeds in 4 steps:
clerk runtest
on a file to generate the output file (whatever the number of tests in it, it will just run catala
that many times)diff the original file and the output file to
this is done by the post-test
ninja rule. The output code (0 or 1) is written in a filename@test
file for tracing failures
@test
files recursivelyThe "generating output + then diffing" scheme has the merit of being simple and decoupling things well ; but if we want finer reporting on individual tests within the same file, we'll have to reimplement diffing directly into clerk runtest
and merge this steps together, so it's not a trivial change. Adding more information in the intermediate @test
files wouldn't be difficult though once we can use OCaml to process them.
A quick placeholder could be to count the hunks in the patch but that'll always be very approximative.
when there is a test failure, the output should be a clean listing of all tests that have failed, grouped by file, and not just the first test that failed.
This, on the other hand, is expected to already be the case. Could you point out the bug in more detail if you find it is not ? (Well, it would be the diff of each file that contains failed tests, but it should be fairly close, and maybe more concise)
Conclusions of a short discussion with @denismerigoux :
clerk report
👍🏿 clerk runtest
, and:
clerk report
will read them individually.clerk report
will leverage these detailed reports to list tested files in a predictable order, and provide several verbosity levels (from total count of failures/tests/files to list of detailed list of tests per file and their individual status)
As of now, the typical output of
clerk test <folder<
is :Where. If there is a test failure, then what is shown is:
5
is the number of files containing one or more tests found insideHowever, because this command is the primary testing method we recommend for a typical Catala workflow, the output of the command should be improved to look better and provide more accurate information. Here is a list of improvements that could be made :
5/5
, it should display37/37 tests across 5 files
I suspect the relevant code to tweak is here for these improvements :
https://github.com/CatalaLang/catala/blob/e7853d69cf1f258142ef6d23a0bdd083d7e2d14e/build_system/clerk_driver.ml#L564-L566
https://github.com/CatalaLang/catala/blob/e7853d69cf1f258142ef6d23a0bdd083d7e2d14e/build_system/clerk_driver.ml#L580-L600