c-cube / qcheck

QuickCheck inspired property-based testing for OCaml.
https://c-cube.github.io/qcheck/
BSD 2-Clause "Simplified" License
340 stars 37 forks source link

Shrinker improvements to QCheck list, string, and function #242

Closed jmid closed 2 years ago

jmid commented 2 years ago

This PR contains the QCheck(1) shrinker improvements to list, string, and functions described in #235. The essential part extracted from https://github.com/c-cube/qcheck/pull/235#issuecomment-1103781517:

Executive summary:

  • the improved list, string, and function shrinkers collectively reduce the runtime of the benchmark from 78.402s to 5.456s on my laptop

The reduced shrink counts in, e.g., test/core/QCheck_expect_test.expected are nice, but beware that this only counts successful shrink attempts. https://github.com/c-cube/qcheck/pull/235#issuecomment-1103781517 documents how the PR reduces the total shrink attempts significantly, e.g.,

I would be grateful if someone could give this a quick spin with an opam pin of https://github.com/jmid/qcheck/tree/shrinker-improvements-list-string-fun on their regular QCheck tests before merging.

c-cube commented 2 years ago

Just a remark, but if this modifies shrinking, I don't see it having any impact on a successful test suite where 0 shrinking happens, correcT?

jmid commented 2 years ago

Just a remark, but if this modifies shrinking, I don't see it having any impact on a successful test suite where 0 shrinking happens, correcT?

Good point 👍 To observe any potential change, it will need to run on a test suite with negative tests (QCheck tests that are expected to fail)

jmid commented 2 years ago

@gasche - I know you've been hand-writing shrinkers in the past. I'd be grateful if you can find time to give this PR a quick spin - just so that someone non-Jan has tried it before merging... 😀

jmid commented 2 years ago

I just rebased and added a CHANGELOG entry. It's been two weeks since this went up. I therefore plan to merge shortly.

gasche commented 2 years ago

Apologies for the lack of reply. I tried to look at the shrinker I wrote last, it works on the AST of a small programming language, and there the current shrinkers are efficient enough that I couldn't measure a difference (I tried to inject bugs, but all examples get minimized in 10 steps or less; maybe the bugs were not subtle enough). The shrinker does not use int or list, only list, but the trickiness and bug-finding power comes from the recursive structure of the AST (mixing the right constructs together to exercice the bug), rather than the list shrinker itself (which is mostly used to reduce n-ary tuples).

vch9 commented 2 years ago

Looks great! The shrinks steps seems to be reduced in the vast majority of cases.

jmid commented 2 years ago

OK, following @vch9 pointing out a shrink count that went up, I dug, found out it was caused by a linear time char shrinker. Here's the benchmark results from replacing it. We are now down below 2secs from 78secs (it took 67secs pre-#240) :smile:

                                                         iteration seed 1234                   iteration seed 8743                   iteration seed 6789               total
Shrink test name                                  Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att    Q1/s   Q2/s
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
long_shrink                                       0.061  149/352     1.057 3039/3099    0.008  147/345     0.675 3068/3127    0.011  145/339     0.775 3063/3124    0.080  2.506
ints arent 0 mod 3                                0.000   84/216     0.000    2/2       0.000   71/204     0.000    1/1       0.000   88/215     0.000   88/305     0.000  0.000
ints are 0                                        0.000   62/63      0.000   61/123     0.000   61/62      0.000   61/122     0.000   62/63      0.000   61/123     0.000  0.000
nat < 5001                                        0.000    6/56      0.000    7/77      0.000    6/47      0.000    7/69      0.000    3/24      0.000    8/85      0.000  0.000
char never produces 'abcdef'                      0.000    0/0       0.000    1/1       0.000    0/0       0.000    0/0       0.000    0/0       0.000    0/0       0.000  0.000
strings are empty                                 0.000   15/24      0.000    8/16      0.000   20/29      0.000   13/26      0.000    7/15      0.000    1/2       0.000  0.000
string never has a \000 char                      0.000    8/23      0.000   22/167     0.000   17/35      0.003   56/254     0.000    6/19      0.000   15/48      0.000  0.003
string never has a \255 char                      0.000   14/34      0.001   59/318     0.000   15/30      0.002   97/529     0.001   21/44      0.004   41/194     0.002  0.007
strings have unique chars                         0.000   13/39      0.000   18/30      0.002   15/40      0.002   24/52      0.000    5/24      0.000   15/20      0.002  0.002
pairs have different components                   0.000    0/4       0.000    0/6       0.000    0/4       0.000    0/6       0.000    0/6       0.000    0/10      0.000  0.000
pairs have same components                        0.000  125/126     0.000   63/125     0.000  124/125     0.000   62/123     0.000  119/120     0.000   63/125     0.000  0.000
pairs have a zero component                       0.000  124/188     0.000  122/306     0.000  123/186     0.001  122/306     0.000  118/182     0.000  123/308     0.000  0.001
pairs are (0,0)                                   0.000  125/126     0.000   63/125     0.000  124/125     0.000   62/123     0.000  119/120     0.000   63/125     0.000  0.000
pairs are ordered                                 0.000  827/17626   0.000   94/1217    0.000  690/12946   0.000   85/865     0.000  687/13963   0.000   94/1326    0.001  0.000
pairs are ordered reversely                       0.000  124/125     0.000   62/124     0.000  123/124     0.000   62/124     0.000  122/123     0.000   62/124     0.000  0.000
pairs sum to less than 128                        0.000  116/129     0.000   56/126     0.000  120/146     0.000   59/138     0.000  119/141     0.000   57/130     0.000  0.000
pairs lists rev concat                            0.017  147/350     0.011   83/168     0.010  135/334     0.003   75/152     0.000  131/320     0.000   67/136     0.027  0.014
pairs lists no overlap                            0.001   26/48      0.006   27/60      0.000   18/39      0.004   18/41      0.000    8/20      0.000   11/28      0.001  0.010
triples have pair-wise different components       0.000    7/31      0.000    3/15      0.000    6/6       0.000    3/3       0.000    2/6       0.000    3/3       0.000  0.000
triples have same components                      0.000  188/252     0.000   64/127     0.000  177/240     0.000   64/128     0.000  182/246     0.000   62/122     0.000  0.000
triples are ordered                               0.000  188/252     0.000    3/4       0.000  177/178     0.000    3/4       0.000  187/250     0.000   91/1021    0.000  0.000
triples are ordered reversely                     0.000  188/189     0.000   64/126     0.000  177/240     0.000  124/247     0.000  182/183     0.000   65/127     0.000  0.000
quadruples have pair-wise different components    0.000   23/41      0.000    4/4       0.000   11/11      0.000    4/4       0.000   14/38      0.000    4/11      0.000  0.000
quadruples have same components                   0.000  250/377     0.000  126/313     0.000  237/424     0.000  115/292     0.000  242/425     0.000  123/307     0.000  0.000
quadruples are ordered                            0.000  251/315     0.000    5/6       0.000  239/240     0.000    4/5       0.000  244/308     0.000    5/6       0.000  0.000
quadruples are ordered reversely                  0.000  251/252     0.000   66/128     0.000  239/302     0.000  126/250     0.000  244/245     0.000   66/128     0.000  0.000
forall (a, b) in nat: a < b                       0.000   13/23      0.000    6/16      0.000   10/15      0.000    6/15      0.000    5/6       0.000    4/7       0.000  0.000
forall (a, b, c) in nat: a < b < c                0.000   15/22      0.000    3/7       0.000   26/53      0.000    7/28      0.000    9/9       0.000    3/3       0.000  0.000
forall (a, b, c, d) in nat: a < b < c < d         0.000   23/29      0.000    4/4       0.000   30/56      0.000    4/4       0.000   13/13      0.000    4/4       0.000  0.000
forall (a, b, c, d, e) in nat: a < b < c < d < e  0.000   28/28      0.000    5/5       0.000   33/33      0.000    5/5       0.000   14/14      0.000    5/5       0.000  0.000
forall (a, b, c, d, e, f) in nat: a < b < c < d   0.000   30/30      0.000    6/6       0.000   38/38      0.000    6/6       0.000   16/16      0.000    6/6       0.000  0.000
forall (a, b, c, d, e, f, g) in nat: a < b < c <  0.000   31/31      0.000    7/7       0.000   41/41      0.000    7/7       0.000   22/22      0.000    7/7       0.000  0.000
forall (a, b, c, d, e, f, g, h) in nat: a < b <   0.000   35/35      0.000    8/8       0.000   48/48      0.000    8/8       0.000   22/22      0.000    7/7       0.000  0.000
forall (a, b, c, d, e, f, g, h, i) in nat: a < b  0.000   42/42      0.000    9/9       0.000   55/55      0.000    9/9       0.000   26/26      0.000    8/8       0.000  0.000
bind ordered pairs                                0.000  125/125     0.000    1/1       0.000  124/124     0.000    1/1       0.000  120/120     0.000    1/1       0.000  0.000
bind list_size constant                           0.000   49/221     0.000   12/26      0.000   50/220     0.000   12/25      0.000   51/228     0.000   11/21      0.000  0.000
lists are empty                                   0.000   12/18      0.000    8/16      0.002   17/23      0.004   13/26      0.000    4/9       0.000    1/2       0.002  0.004
lists shorter than 10                             0.000   39/242     0.000   16/30      0.000   53/315     0.002   21/42      0.001   47/312     0.000   15/29      0.001  0.002
lists shorter than 432                            0.173 1632/19461   1.026  412/457     0.133 1677/20004   0.981  405/450     0.073 1735/20729   0.133  419/447     0.379  2.141
lists shorter than 4332                           0.092   13/67      4.305 4022/4087    0.113   10/35      5.572 4020/4067    0.016    8/55      3.941 4013/4055    0.220 13.818
lists equal to duplication                        0.188   26/35      0.464    4/7       0.000    6/12      0.000    3/6       0.013   25/35      0.128   17/35      0.200  0.592
lists have unique elems                           0.000    8/18      0.000   11/22      0.003   12/23      0.005   17/30      0.000    8/20      0.000   10/20      0.003  0.005
tree contains only 42                             0.000   10/10      0.000    2/2       0.000   10/10      0.000    2/2       0.000   12/13      0.000    2/2       0.000  0.000
fail_pred_map_commute                             0.001   77/237     0.001  122/342     0.000   73/223     0.000   19/83      0.000   87/268     0.000   14/39      0.001  0.001
fail_pred_strings                                 0.000    1/3       0.000    2/5       0.000    1/4       0.000    1/4       0.000    1/4       0.000    1/4       0.000  0.000
fold_left fold_right                              0.000   34/104     0.001   56/189     0.001   39/120     0.000   21/58      0.000   18/54      0.000   22/64      0.001  0.001
fold_left fold_right uncurried                    0.002   44/199     0.051  376/1550    0.039   55/170     3.052 2064/8057    0.000    5/15      0.000    4/17      0.041  3.104
fold_left fold_right uncurried fun last           0.001   34/101     0.001   56/189     0.001   42/129     0.003   72/170     0.000   19/57      0.000   28/73      0.001  0.004
fold_left test, fun first                         0.003   36/65      0.001   15/28      1.005   87/1129    3.468   47/9773    0.023   76/713     0.002   36/64      1.032  3.471
                                                                                                                                                                    1.995 25.689
jmid commented 2 years ago

OK, implemented string shrinker conversions which makes the shrinker benchmark run consistently in under 2 secs on my machine. String.fold_right is from OCaml 4.13 - so the CI is unhappy.

jmid commented 2 years ago

OK, I added a hand-rolled string_fold_right - and now this also builds and runs on 4.08 again.

jmid commented 2 years ago

OK, thanks for the review @vch9 :pray:

Overall, with the inclusion of the char improvement this is now down from 78secs to 2secs :partying_face: :champagne:

Green light to merge @vch9?

vch9 commented 2 years ago

OK, thanks for the review @vch9 🙏

* [b1e69f8](https://github.com/c-cube/qcheck/commit/b1e69f8ebd12a777ae7210396836180bd203bd50) fixes the indentation

* [4aad609](https://github.com/c-cube/qcheck/commit/4aad609afae413878eb19c202fedf6771a7053e0) removes the commented out parameter

* [a4c1e2e](https://github.com/c-cube/qcheck/commit/a4c1e2eb6670f3108a1302a12ab8287ec0d42937) and [c89ab44](https://github.com/c-cube/qcheck/commit/c89ab442a9d2763d6e858a848310c856eedf84ea) fixes the linear-time `char` reducer causing an increase in shrink count

* [c89ab44](https://github.com/c-cube/qcheck/commit/c89ab442a9d2763d6e858a848310c856eedf84ea) and [254e0c2](https://github.com/c-cube/qcheck/commit/254e0c285b281b9345a28b76d3bbb3f062bd25c9) replaces the needless intermediate `Seq` representation in the `string` shrinker

* [04ecdf0](https://github.com/c-cube/qcheck/commit/04ecdf042046b5c103922a0d9bc25e5fe25a22dc) updates the CHANGELOG to mention the `char` shrinker improvement

Overall, with the inclusion of the char improvement this is now down from 78secs to 2secs 🥳 🍾

Green light to merge @vch9?

Absolutely! This is a nice improvement :) The char shrinker makes sense!