CatalaLang / catala

Programming language for literate programming law specification
https://catala-lang.org
Apache License 2.0
1.96k stars 80 forks source link

Improve the output of clerk test #623

Closed denismerigoux closed 2 months ago

denismerigoux commented 3 months ago

As of now, the typical output of clerk test <folder< is :

[27/27] <test> 'tests@test'

[PASS] tests:   5/5

Where 5 is the number of files containing one or more tests found inside . If there is a test failure, then what is shown is:

[21/27] <test> tests/benefices_non_commerciaux.catala_fr
--- reference
+++ current-output
@@ -159,7 +159,7 @@
 ┌─[RESULT]─
 │ sortie =
 │   Impot_revenu.BénéficesNonCommerciauxFoyerFiscal {
-│     -- résultats_liquidatin_bénéfices_non_commerciaux:
+│     -- résultats_liquidation_bénéfices_non_commerciaux:
 │       [
 │         Impot_revenu.BénéficesNonCommerciauxDéclarant {
 │           -- abattement_forfaitaire_micro_professionnel: 0,00 €
[27/27] <test> 'tests@test'
FAILED: tests@test 
out='tests@test' ; success=$( tr -cd 0 < '_build/tests@test' | wc -c ) ; total=$( wc -c < '_build/tests@test' ) ; pass=$( ) ; if test "$success" -eq "$total" ; then printf "\n[\033[32mPASS\033[m] \033[1m%s\033[m: \033[32m%3d\033[m/\033[32m%d\033[m\n" ${out%@test} $success $total ; else printf "\n[\033[31mFAIL\033[m] \033[1m%s\033[m: \033[31m%3d\033[m/\033[32m%d\033[m\n" ${out%@test} $success $total ; return 1 ; fi

[FAIL] tests:   4/5

However, because this command is the primary testing method we recommend for a typical Catala workflow, the output of the command should be improved to look better and provide more accurate information. Here is a list of improvements that could be made :

I suspect the relevant code to tweak is here for these improvements :

https://github.com/CatalaLang/catala/blob/e7853d69cf1f258142ef6d23a0bdd083d7e2d14e/build_system/clerk_driver.ml#L564-L566

https://github.com/CatalaLang/catala/blob/e7853d69cf1f258142ef6d23a0bdd083d7e2d14e/build_system/clerk_driver.ml#L580-L600

AltGr commented 3 months ago

Better test output

As you can see in the last chunk of code you linked, the way to report the status is quite ugly: since running the tests is handled by ninja, we just use a special rule at the end for reporting the results. At the moment, this rule is just a short shell snippet ; this can be seen in _build/clerk.ninja after running clerk in debug mode:

rule test-results
  command = out=${out} ; success=$$( tr -cd 0 < ${in} | wc -c ) ; total=$$( wc -c < ${in} ) ; pass=$$( ) ; if test "$$success" -eq "$$total" ; then printf "\n[PASS]$ %s:$ %3d/%d\n" $${out%@test} $$success $$total ; else printf "\n[FAIL]$ %s:$ %3d/%d\n" $${out%@test} $$success $$total ; return 1 ; fi
  description = <test> ${out}

⇒ A better way to handle this would be to implement a clerk report internal subcommand (we already have clerk runtest in this category) that could do that more cleanly with OCaml code, and gets called by this rule.

We don't have at the moment the information about how many tests there were in each file though: testing proceeds in 4 steps:

The "generating output + then diffing" scheme has the merit of being simple and decoupling things well ; but if we want finer reporting on individual tests within the same file, we'll have to reimplement diffing directly into clerk runtest and merge this steps together, so it's not a trivial change. Adding more information in the intermediate @testfiles wouldn't be difficult though once we can use OCaml to process them.

A quick placeholder could be to count the hunks in the patch but that'll always be very approximative.

Reporting diff

when there is a test failure, the output should be a clean listing of all tests that have failed, grouped by file, and not just the first test that failed.

This, on the other hand, is expected to already be the case. Could you point out the bug in more detail if you find it is not ? (Well, it would be the diff of each file that contains failed tests, but it should be fairly close, and maybe more concise)

AltGr commented 3 months ago

Conclusions of a short discussion with @denismerigoux :