c-cube / qcheck

QuickCheck inspired property-based testing for OCaml.
https://c-cube.github.io/qcheck/
BSD 2-Clause "Simplified" License
347 stars 36 forks source link

Less exhaustive string shrinking #278

Open jmid opened 1 year ago

jmid commented 1 year ago

As the new QCheck.Shrink.string unit tests document, on a failure path (going through all shrinking candidates without luck) the current approach may be a bit too exhaustive: https://github.com/c-cube/qcheck/blob/f37621a4763d556f05e35cefe81d4e29b932be25/test/core/QCheck_unit_tests.ml#L93-L103

By falling back on Shrink.list_spine we get a nice and short candidate sequence when we stay clear of element reduction:

# Shrink.string ~shrink:Shrink.nil "zzzzz" (fun xs -> Printf.printf "%s\n" Print.(string xs));;
zzz
zzzz
zzz
zzzz

However, exhaustively trying to reduce all O(n) entries, say O(log n) times, brings this to O(n log n) which may be a bit much:

# Shrink.string ~shrink:Shrink.char "zzzzz" (fun xs -> Printf.printf "%s\n" Print.(string xs));;
zzz
zzzz
zzz
zzzz
nzzzz
tzzzz
wzzzz
yzzzz
znzzz
ztzzz
zwzzz
zyzzz
zznzz
zztzz
zzwzz
zzyzz
zzznz
zzztz
zzzwz
zzzyz
zzzzn
zzzzt
zzzzw
zzzzy

We are getting bitten by this in ocaml-multicore/multicoretests, as remarked by @edwintorok here: https://github.com/ocaml-multicore/multicoretests/pull/329#issuecomment-1522990777

As an alternative to avoiding character-shrinks above a certain threshold, one could consider alternative approaches, e.g.

edwintorok commented 1 year ago

FWIW I'm using the following string generator/shrinker in my code, it makes the shrinker work reasonably fast, although it is not a general purpose shrinker.

I tried to make 'shrink_all_chars' apply Shrink.char to each char just once, but even that is too slow, even if I write a one-step shrinker using just Char.chr/Char.code/Bytes.set and bit shifts (my strings are sometimes in the range of ~64k to reproduce certain bugs), workarounds there are to artifically introduce smaller limits elsewhere in the code such that bugs can still be hit with much shorter strings.

let truncated_str s =
  if String.length s < 6 then
    String.escaped s
  else
    Printf.sprintf "%s…(len:%d)"
      (String.sub s 0 6 |> String.escaped)
      (String.length s)

let all_bytes = String.init 256 Char.chr

let repeat n = List.init n (fun _ -> all_bytes) |> String.concat ""

let shrink_string_length str =
  let open QCheck in
  str |> String.length |> Shrink.int |>
  QCheck.Iter.map (String.sub str 0)

let shrink_all_chars str yield =
  (* shrinking each char individually would yield too many possibilities
     on long strings if shrinking doesn't reproduce the bug anymore.
     Instead shrink all chars at once, with a single result.
     It will eventually converge on 'aa...a'.
  *)
  let next = String.make (String.length str) 'a' in
  (* avoid infinite iteration: stop when target reached *)
  if not (String.equal next str) then
    yield next

let shrink_string str yield =
  shrink_string_length str yield;
  shrink_all_chars str yield

let bounded_string_arb n =
  (* QCheck.Shrink.string is very slow: first converts to list of chars.
     In our case bugs come from 3 sources:
      * the byte values (e.g. ASCII vs non-ascii)
      * the length of the string (whether it hits various limits or not)
      * the uniqueness of the string (e.g. duplicate inserts)

    Generate a small fully random string of fixed size, and concatenate it with a random length substring of a static string.
    Shrinking will take care of producing smaller and more readable results as needed,
    even below the fixed size
  *)

  assert (n > 4) ;

  let n = n - 4 in
  let long = ref [] in
  let () = if n > 0 then
  (* pregenerate all the long strings by running the shrinker on a static string *)
    let max_repetitions = n / String.length all_bytes in
    let max_str =
      repeat max_repetitions
      ^ String.sub all_bytes 0 (n mod String.length all_bytes)
    in
    shrink_string_length max_str (fun s -> long := s :: !long)
  in
  let gen_long = QCheck.Gen.oneofa @@ Array.of_list !long in
  let gen_string =
    let open QCheck.Gen in
    let* small = string_size @@ return 4 in
    let+ long = gen_long in
    small ^ long
  in
  QCheck.(make
    ~print:truncated_str
    ~small:String.length
    ~shrink:shrink_string
    gen_string)
edwintorok commented 1 year ago

I think it might be useful to guide the shrinker based on the kind of failure we're shrinking. E.g. if we got an exception that says it is length related (e.g. various internal invariants around sizes/quota/etc. broken) then try length based shrinking only (char based shrinking would waste a lot of time failing to shrink...), and only once that is exhausted try only 2 more steps:

OTOH if the failure says something invalid about the string itself then you need to try both methods (and hopefully length based shrinking would first cut down the length considerably...)

This would need some kind of user provided exception classifier, and perhaps too specific and would complicate the API too much. Perhaps a better way would be for the shrinker to collect stats on what worked and what didn't for a specific counterexample reduction attempt (stats to be reset every time another qcheck test runs): if the char based shrinking doesn't result in any reductions then stop enumerating them after a while for a given "type".

It could also try to "optimistically" jump ahead, e.g. if previously shrinking succeeded to reduce length to N bytes, then for the next parameter try length N directly (if it doesn't work fall back to Shrink.int as usual, perhaps noting that N already failed and stop there).

Finally for shrinking list of strings "delta debugging" techniques might be useful (these are usually used to shrink source code for compilers after splitting by toplevel forms): https://github.com/mpflanzer/delta. This paper in particular gives a nice overview: https://www.st.cs.uni-saarland.de/papers/tse2002/, see Figure 6 for an example. The basic idea is to introduce a shrinking "granularity", and e.g. shrink groups of N/2, then N/4, N/8, etc. I don't think it would work at the char level (it would in fact generate way more possibilities than QCheck.Shrink.string currently would), but it shows that you don't have to immediately split up things into its tiniest component (string -> char), but try shrinking in larger chunks first.

Finally: why list of chars instead of array of chars? Surely arrays can be indexed more efficiently than lists, or in fact Bytes might be the most efficient form for shrinking.

jmid commented 1 year ago

Thanks for sharing tips, feedback, and pointers!

At the moment, in QCheck(1) we are using the interface in which shrinkers are 'a -> 'a Iter.t functions, that are restarted after each successful shrink step. This is a fairly common setup within QuickCheck frameworks I think, all derived from the original Haskell implementation. With the FP roots, this interface is not necessarily compatible with delta debugging, which in the above linked presentation seem to require some state to continue reducing in a particular span of "deltas" between invocations.

The repeated restarting also explains why an O(n log n) iterator function is too eager/exhaustive, if it is restarted, say O(n) times, we end up spending O(n^2 log n).... :grimacing:

We did have a more exhaustive list strinker at some point, but its performance was terrible on larger lists once we started measuring in #177. The shrinker improvements in #242 (and slightly tweaked in yesterday's #277) thus sticks to a sub-linear O(log n) iterator function for reducing the length of lists and (transitively) strings. In that light, it is a bit funny in hindsight, to then ruin the resulting Shrink.string performance, by spending O(n log n) to reduce each character with Shrink.char in the 'element reduction' phase following the fast 'size reduction'... :sweat_smile:

Finally: why list of chars instead of array of chars? Surely arrays can be indexed more efficiently than lists, or in fact Bytes might be the most efficient form for shrinking.

Good point, I completely agree that the string shrinker may very likely be improved by not reducing it to char lists - perhaps in particular in the 'element reduction' phase...

edwintorok commented 1 year ago

With the FP roots, this interface is not necessarily compatible with delta debugging, which in the above linked presentation seem to require some state to continue reducing in a particular span of "deltas" between invocations.

Indeed there is no convenient place to store the "granularity" (unless we do something really "unfunctional" and use an ephemeron to store the last granularity for a particular generated string, or other kinds of globals...), and without it shrinking would perform a lot of unnecessary additional steps (as mentioned in the paper). Here is the best I could get so far trying to emulate delta debugging, but it is in fact in this form worse than QCheck's current shrinker:

open QCheck

(** [[start, stop)] interval from start to stop, start is included and stop is excluded *)
type intervals = (int*int) Iter.t

let interval_neq (s1,e1) (s2,e2) =
    not (Int.equal s1 s2 && Int.equal e1 e2)

let intervals_to_string s : intervals Iter.t -> string Iter.t =
    let b = Buffer.create (String.length s) in
    Iter.map @@ fun intervals ->
    Buffer.clear b;
    (* TODO: this could be further optimized to coalesce adjacent intervals so we can blit bigger pieces in one go *)
    let () = intervals @@ fun (start,stop) ->
        Buffer.add_substring b s start (stop-start)
    in
    Buffer.contents b

let increase_granularity : intervals -> intervals = fun intervals yield ->
    intervals @@ fun (start, stop) ->
    let mid = (start + stop) lsr 1 in (* avoids negative overflow *)
    assert (start <> mid);
    yield (start, mid);
    yield (mid, stop)

let rec shrink_intervals (n, test_complements) intervals : intervals Iter.t =
    let n, intervals = n lsr 1, increase_granularity intervals in
    if n = 0 then Iter.empty
    else
    let complement interval =
        Iter.filter (interval_neq interval) intervals
    in
    fun yield ->
    (* test subsets *)
    Iter.map Iter.return intervals yield;

    (* test complements *)
    if test_complements then
        Iter.map complement intervals yield;

    (* increase granularity *)
    shrink_intervals (n, true) intervals yield

let shrink_intervals start stop =
    shrink_intervals (stop, false) @@ Iter.return (start, stop)

let shrink_string_delta s =
    let intervals_to_string = intervals_to_string s in
    shrink_intervals 0 (String.length s)
    |> intervals_to_string

let test_f s =
    print_endline s;
    not (
        String.contains s '1' &&
        String.contains s '7' &&
        String.contains s '8'
    )

let () =
    let str_test =
        QCheck.make ~print:Fun.id ~small:String.length
        ~shrink:shrink_string_delta
        (QCheck.Gen.return "12345678")
    in
    let str_test0 =
        QCheck.make ~print:Fun.id ~small:String.length
        ~shrink:QCheck.Shrink.string
        (QCheck.Gen.return "12345678")
    in
    QCheck_runner.run_tests_main
    [ QCheck.Test.make str_test test_f
    ; QCheck.Test.make str_test0 test_f
    ]

However even if we don't emulate delta debugging it might be worthwhile to reduce a set of intervals first (as shown above those intervals don't need to be stored, they can be an (int*int) Iter.t or (int*int) Seq.t, and then build the strings just at the end, which for large strings can take advantage of fast blits of substrings.

jmid commented 1 year ago

Thanks for sharing! As far as I can tell, this (a) doesn't attempt shrinking of the individual characters and (b) is not sub-linear. Here's a quick ASCII chart plotting the increasing number of shrinking candidates as the input size goes up:

utop # for i=1 to 20 do shrink_string_delta (String.make i 'X') (fun _ -> print_char '#'); print_newline () done;;

##
##
##########
##########
##########
##########
##########################
##########################
##########################
##########################
##########################
##########################
##########################
##########################
##########################################################
##########################################################
##########################################################
##########################################################
##########################################################

The latter hurts (as you pointed out earlier), when repeatedly restarting the iterator as QCheck(1)'s shrinker does. We actually had something closer to this for lists initially added in e7b44ac as a response to #32 and later improved slightly in #63 (and described in #64). The performance was measured in the shrinker benchmark suite in #177, which demonstrated performance problems on larger input lists. The current sub-linear list shrinker added in #242 was added in response to this (and slightly tweaked recently in #277).

Here's a plot for comparison (if we disregard the element shrinking):

utop # for i=1 to 20 do Shrink.string ~shrink:Shrink.nil (String.make i 'X') (fun _ -> print_char '#'); print_newline () done;;
#
###
####
####
#####
#####
#####
#####
######
######
######
######
######
######
######
######
#######
#######
#######
#######

As I see it, this nice logarithmic ASCII curve is killed by bolting on a linear-time char-level shrinker that runs afterwards:

utop # for i=1 to 20 do Shrink.string ~shrink:Shrink.char (String.make i 'X') (fun _ -> print_char '#'); print_newline () done;;
####
#########
#############
################
####################
#######################
##########################
#############################
#################################
####################################
#######################################
##########################################
#############################################
################################################
###################################################
######################################################
##########################################################
#############################################################
################################################################
###################################################################

Earlier today, I was trying out a strategy to "zero-out" blocks of sqrt size with 'a's:

utop # let rec all_as s i j =
  if i>=j
  then true
  else s.[i] == 'a' && all_as s (i+1) j

let string_shrink char_shrink s yield =
  let open QCheck in
  let a_count = String.fold_left (fun count c -> if c = 'a' then count+1 else count) 0 s in
  let len = String.length s in
  let len_sqrt = int_of_float (ceil (sqrt (float_of_int len))) in
  if len <= 5 || a_count >= len - len_sqrt
  then Shrink.string ~shrink:char_shrink s yield
  else
    begin
      Shrink.string ~shrink:Shrink.nil s yield;
      let rec loop i =
        if i < len
        then
          begin
            (if not (all_as s i (i+len_sqrt))
             then
               let s = String.init len (fun j -> if i <= j && j < i+len_sqrt then 'a' else s.[j]) in
               yield s);
            loop (i+len_sqrt)
          end
      in
      loop 0
    end
utop # string_shrink Shrink.char "zzzzzzzzzzzzzzzzz" print_endline;;
zzzzzzzzz
zzzzzzzzzzzzz
zzzzzzzzzzzzzzz
zzzzzzzzzzzzzzzz
zzzzzzzzzzzzzzz
zzzzzzzzzzzzzzzz
zzzzzzzzzzzzzzzz
aaaaazzzzzzzzzzzz
zzzzzaaaaazzzzzzz
zzzzzzzzzzaaaaazz
zzzzzzzzzzzzzzzaa

This scales better as the size goes up AFAICS:

utop # for i=1 to 20 do string_shrink Shrink.char (String.make i 'X') (fun _ -> print_char '#'); print_newline () done;;
####
#########
#############
################
####################
#######
########
########
#########
#########
#########
#########
##########
##########
##########
##########
###########
###########
###########
###########

Caveats:

edwintorok commented 1 year ago

Perhaps the choice of char shrinker (or its depth) could be given to the end-user of the API (with a good default...), e.g. have the ability to turn off char shrinking completely, which I don't think is currently possible (if you leave it as none you get a default shrinker instead of no shrinker), although I'm not sure how to do that in a compatible way with the current API. Unless there is a special Shrinker.none that the string shrinker detects, so it doesn't even need to explode the string into characters only to find out there is nothing to shrink...

Then perhaps some heuristics could be used based on the length of the string on what kind of char shrinker to use, e.g. we could give the shrinker a certain amount of "fuel", which it can spend as it best sees fit: either for shrinking the length of the string, or once that is exhausted for shrinking characters. Sys.max_string_length of 2^57 means that if length shrinking is logarithmic a fuel of 57 (or some constant multiple of it) can shrink even the theoretically largest string you can ever create. And whatever "fuel" is left over from shrinking the length can be spent on shrinking characters, which would allow a bit more exploration of the state space for short strings (where building all those variants is quicker). Fuel would be consumed based on either spending CPU time processing an item or allocating.

So in that sense length shrinking would consume 'O(1)' fuel for allocation (it calls the allocation function once, for simplicity lets assume that allocating 1MiB is as cheap as allocating 1 byte, which might be even true if it all fits in the minor heap and allocated by the bump allocator), and 'O(n)' processing time, although O(1) function calls to perform that processing. SImilarly exploding the string itself (regardless how many characters are shrunk) would be O(n) allocation initially, followed by potentially O(n) processing time, including O(n) shrinker calls.

There is a balance here on how to measure things, so for simplicity I propose to consume fuel in this way instead:

And the idea of 'fuel' is already part of how recursive data structures are generated, so why not use that idea for shrinking as well? (especially that in QCheck2 it looks like shrinking works on a tree-like structure and it can probably track fuel internally without having to change the 'set_shrink' API ...). For QCheck version 1 fixing this in a compatible way might be more tricky (e.g. ?n:int -> 'a -> 'a Iter.t) although I'm not entirely sure whether that would break existing code that wouldn't know about the fuel.

Then small values would do more exhaustive shrinking, and large values (that cannot be further shrinked in length) would perform more limited shrinking.

What do you think?

jmid commented 1 year ago

Perhaps the choice of char shrinker (or its depth) could be given to the end-user of the API (with a good default...), e.g. have the ability to turn off char shrinking completely, which I don't think is currently possible (if you leave it as none you get a default shrinker instead of no shrinker), although I'm not sure how to do that in a compatible way with the current API.

You can use

Shrink.string ~shrink:Shrink.nil

as I do above in the second plot. Plots 2 and 3 were meant to illustrate the difference between shrinking strings with and without char/element reductions.

I also realize now that my message above was unclear: I am measuring and reasoning about the complexity of the shrinkers in terms of the emitted shrinking candidates (the plots are printing a '#' for each candidate). This was not at all clear from my message in hinsight.

From having spent time collecting the shrinker benchmarks in #177 and later trying to improve the QCheck(1) shrinker performance, allocations and processing aren't as dominating as the number of emitted shrinking candidates. Also, O(log n) shrinking candidates just compose a lot better, when put inside a list, which is again inside tuples (or function graphs underlying the function generator and shrinker), ...

[...] Then small values would do more exhaustive shrinking, and large values (that cannot be further shrinked in length) would perform more limited shrinking.

What do you think?

It sounds intriguing - but also ambitious! I have to admit that I currently have limited time for QCheck work. So it is on a by-need-basis - as here - when it turns out that too eager string shrinker is hurting us.