Open jmid opened 1 year ago
FWIW I'm using the following string generator/shrinker in my code, it makes the shrinker work reasonably fast, although it is not a general purpose shrinker.
I tried to make 'shrink_all_chars' apply Shrink.char to each char just once, but even that is too slow, even if I write a one-step shrinker using just Char.chr/Char.code/Bytes.set
and bit shifts (my strings are sometimes in the range of ~64k to reproduce certain bugs), workarounds there are to artifically introduce smaller limits elsewhere in the code such that bugs can still be hit with much shorter strings.
let truncated_str s =
if String.length s < 6 then
String.escaped s
else
Printf.sprintf "%s…(len:%d)"
(String.sub s 0 6 |> String.escaped)
(String.length s)
let all_bytes = String.init 256 Char.chr
let repeat n = List.init n (fun _ -> all_bytes) |> String.concat ""
let shrink_string_length str =
let open QCheck in
str |> String.length |> Shrink.int |>
QCheck.Iter.map (String.sub str 0)
let shrink_all_chars str yield =
(* shrinking each char individually would yield too many possibilities
on long strings if shrinking doesn't reproduce the bug anymore.
Instead shrink all chars at once, with a single result.
It will eventually converge on 'aa...a'.
*)
let next = String.make (String.length str) 'a' in
(* avoid infinite iteration: stop when target reached *)
if not (String.equal next str) then
yield next
let shrink_string str yield =
shrink_string_length str yield;
shrink_all_chars str yield
let bounded_string_arb n =
(* QCheck.Shrink.string is very slow: first converts to list of chars.
In our case bugs come from 3 sources:
* the byte values (e.g. ASCII vs non-ascii)
* the length of the string (whether it hits various limits or not)
* the uniqueness of the string (e.g. duplicate inserts)
Generate a small fully random string of fixed size, and concatenate it with a random length substring of a static string.
Shrinking will take care of producing smaller and more readable results as needed,
even below the fixed size
*)
assert (n > 4) ;
let n = n - 4 in
let long = ref [] in
let () = if n > 0 then
(* pregenerate all the long strings by running the shrinker on a static string *)
let max_repetitions = n / String.length all_bytes in
let max_str =
repeat max_repetitions
^ String.sub all_bytes 0 (n mod String.length all_bytes)
in
shrink_string_length max_str (fun s -> long := s :: !long)
in
let gen_long = QCheck.Gen.oneofa @@ Array.of_list !long in
let gen_string =
let open QCheck.Gen in
let* small = string_size @@ return 4 in
let+ long = gen_long in
small ^ long
in
QCheck.(make
~print:truncated_str
~small:String.length
~shrink:shrink_string
gen_string)
I think it might be useful to guide the shrinker based on the kind of failure we're shrinking. E.g. if we got an exception that says it is length related (e.g. various internal invariants around sizes/quota/etc. broken) then try length based shrinking only (char based shrinking would waste a lot of time failing to shrink...), and only once that is exhausted try only 2 more steps:
reducing several chars simultaneously "zzzzz" -> "aaazz"
OTOH if the failure says something invalid about the string itself then you need to try both methods (and hopefully length based shrinking would first cut down the length considerably...)
This would need some kind of user provided exception classifier, and perhaps too specific and would complicate the API too much. Perhaps a better way would be for the shrinker to collect stats on what worked and what didn't for a specific counterexample reduction attempt (stats to be reset every time another qcheck test runs): if the char based shrinking doesn't result in any reductions then stop enumerating them after a while for a given "type".
It could also try to "optimistically" jump ahead, e.g. if previously shrinking succeeded to reduce length to N bytes, then for the next parameter try length N directly (if it doesn't work fall back to Shrink.int as usual, perhaps noting that N already failed and stop there).
Finally for shrinking list of strings "delta debugging" techniques might be useful (these are usually used to shrink source code for compilers after splitting by toplevel forms): https://github.com/mpflanzer/delta. This paper in particular gives a nice overview: https://www.st.cs.uni-saarland.de/papers/tse2002/, see Figure 6 for an example. The basic idea is to introduce a shrinking "granularity", and e.g. shrink groups of N/2, then N/4, N/8, etc. I don't think it would work at the char level (it would in fact generate way more possibilities than QCheck.Shrink.string currently would), but it shows that you don't have to immediately split up things into its tiniest component (string -> char), but try shrinking in larger chunks first.
Finally: why list of chars instead of array of chars? Surely arrays can be indexed more efficiently than lists, or in fact Bytes might be the most efficient form for shrinking.
Thanks for sharing tips, feedback, and pointers!
At the moment, in QCheck(1) we are using the interface in which shrinkers are 'a -> 'a Iter.t
functions, that are restarted after each successful shrink step. This is a fairly common setup within QuickCheck frameworks I think, all derived from the original Haskell implementation. With the FP roots, this interface is not necessarily compatible with delta debugging, which in the above linked presentation seem to require some state to continue reducing in a particular span of "deltas" between invocations.
The repeated restarting also explains why an O(n log n) iterator function is too eager/exhaustive, if it is restarted, say O(n) times, we end up spending O(n^2 log n).... :grimacing:
We did have a more exhaustive list strinker at some point, but its performance was terrible on larger lists once we started measuring in #177. The shrinker improvements in #242 (and slightly tweaked in yesterday's #277) thus sticks to a sub-linear O(log n) iterator function for reducing the length of lists and (transitively) strings.
In that light, it is a bit funny in hindsight, to then ruin the resulting Shrink.string
performance, by spending O(n log n) to reduce each character with Shrink.char
in the 'element reduction' phase following the fast 'size reduction'... :sweat_smile:
Finally: why list of chars instead of array of chars? Surely arrays can be indexed more efficiently than lists, or in fact Bytes might be the most efficient form for shrinking.
Good point, I completely agree that the string
shrinker may very likely be improved by not reducing it to char list
s - perhaps in particular in the 'element reduction' phase...
With the FP roots, this interface is not necessarily compatible with delta debugging, which in the above linked presentation seem to require some state to continue reducing in a particular span of "deltas" between invocations.
Indeed there is no convenient place to store the "granularity" (unless we do something really "unfunctional" and use an ephemeron to store the last granularity for a particular generated string, or other kinds of globals...), and without it shrinking would perform a lot of unnecessary additional steps (as mentioned in the paper). Here is the best I could get so far trying to emulate delta debugging, but it is in fact in this form worse than QCheck's current shrinker:
open QCheck
(** [[start, stop)] interval from start to stop, start is included and stop is excluded *)
type intervals = (int*int) Iter.t
let interval_neq (s1,e1) (s2,e2) =
not (Int.equal s1 s2 && Int.equal e1 e2)
let intervals_to_string s : intervals Iter.t -> string Iter.t =
let b = Buffer.create (String.length s) in
Iter.map @@ fun intervals ->
Buffer.clear b;
(* TODO: this could be further optimized to coalesce adjacent intervals so we can blit bigger pieces in one go *)
let () = intervals @@ fun (start,stop) ->
Buffer.add_substring b s start (stop-start)
in
Buffer.contents b
let increase_granularity : intervals -> intervals = fun intervals yield ->
intervals @@ fun (start, stop) ->
let mid = (start + stop) lsr 1 in (* avoids negative overflow *)
assert (start <> mid);
yield (start, mid);
yield (mid, stop)
let rec shrink_intervals (n, test_complements) intervals : intervals Iter.t =
let n, intervals = n lsr 1, increase_granularity intervals in
if n = 0 then Iter.empty
else
let complement interval =
Iter.filter (interval_neq interval) intervals
in
fun yield ->
(* test subsets *)
Iter.map Iter.return intervals yield;
(* test complements *)
if test_complements then
Iter.map complement intervals yield;
(* increase granularity *)
shrink_intervals (n, true) intervals yield
let shrink_intervals start stop =
shrink_intervals (stop, false) @@ Iter.return (start, stop)
let shrink_string_delta s =
let intervals_to_string = intervals_to_string s in
shrink_intervals 0 (String.length s)
|> intervals_to_string
let test_f s =
print_endline s;
not (
String.contains s '1' &&
String.contains s '7' &&
String.contains s '8'
)
let () =
let str_test =
QCheck.make ~print:Fun.id ~small:String.length
~shrink:shrink_string_delta
(QCheck.Gen.return "12345678")
in
let str_test0 =
QCheck.make ~print:Fun.id ~small:String.length
~shrink:QCheck.Shrink.string
(QCheck.Gen.return "12345678")
in
QCheck_runner.run_tests_main
[ QCheck.Test.make str_test test_f
; QCheck.Test.make str_test0 test_f
]
However even if we don't emulate delta debugging it might be worthwhile to reduce a set of intervals first (as shown above those intervals don't need to be stored, they can be an (int*int) Iter.t
or (int*int) Seq.t
, and then build the strings just at the end, which for large strings can take advantage of fast blits of substrings.
Thanks for sharing! As far as I can tell, this (a) doesn't attempt shrinking of the individual characters and (b) is not sub-linear. Here's a quick ASCII chart plotting the increasing number of shrinking candidates as the input size goes up:
utop # for i=1 to 20 do shrink_string_delta (String.make i 'X') (fun _ -> print_char '#'); print_newline () done;;
##
##
##########
##########
##########
##########
##########################
##########################
##########################
##########################
##########################
##########################
##########################
##########################
##########################################################
##########################################################
##########################################################
##########################################################
##########################################################
The latter hurts (as you pointed out earlier), when repeatedly restarting the iterator as QCheck(1)'s shrinker does. We actually had something closer to this for lists initially added in e7b44ac as a response to #32 and later improved slightly in #63 (and described in #64). The performance was measured in the shrinker benchmark suite in #177, which demonstrated performance problems on larger input lists. The current sub-linear list shrinker added in #242 was added in response to this (and slightly tweaked recently in #277).
Here's a plot for comparison (if we disregard the element shrinking):
utop # for i=1 to 20 do Shrink.string ~shrink:Shrink.nil (String.make i 'X') (fun _ -> print_char '#'); print_newline () done;;
#
###
####
####
#####
#####
#####
#####
######
######
######
######
######
######
######
######
#######
#######
#######
#######
As I see it, this nice logarithmic ASCII curve is killed by bolting on a linear-time char
-level shrinker that runs afterwards:
utop # for i=1 to 20 do Shrink.string ~shrink:Shrink.char (String.make i 'X') (fun _ -> print_char '#'); print_newline () done;;
####
#########
#############
################
####################
#######################
##########################
#############################
#################################
####################################
#######################################
##########################################
#############################################
################################################
###################################################
######################################################
##########################################################
#############################################################
################################################################
###################################################################
Earlier today, I was trying out a strategy to "zero-out" blocks of sqrt size with 'a's:
utop # let rec all_as s i j =
if i>=j
then true
else s.[i] == 'a' && all_as s (i+1) j
let string_shrink char_shrink s yield =
let open QCheck in
let a_count = String.fold_left (fun count c -> if c = 'a' then count+1 else count) 0 s in
let len = String.length s in
let len_sqrt = int_of_float (ceil (sqrt (float_of_int len))) in
if len <= 5 || a_count >= len - len_sqrt
then Shrink.string ~shrink:char_shrink s yield
else
begin
Shrink.string ~shrink:Shrink.nil s yield;
let rec loop i =
if i < len
then
begin
(if not (all_as s i (i+len_sqrt))
then
let s = String.init len (fun j -> if i <= j && j < i+len_sqrt then 'a' else s.[j]) in
yield s);
loop (i+len_sqrt)
end
in
loop 0
end
utop # string_shrink Shrink.char "zzzzzzzzzzzzzzzzz" print_endline;;
zzzzzzzzz
zzzzzzzzzzzzz
zzzzzzzzzzzzzzz
zzzzzzzzzzzzzzzz
zzzzzzzzzzzzzzz
zzzzzzzzzzzzzzzz
zzzzzzzzzzzzzzzz
aaaaazzzzzzzzzzzz
zzzzzaaaaazzzzzzz
zzzzzzzzzzaaaaazz
zzzzzzzzzzzzzzzaa
This scales better as the size goes up AFAICS:
utop # for i=1 to 20 do string_shrink Shrink.char (String.make i 'X') (fun _ -> print_char '#'); print_newline () done;;
####
#########
#############
################
####################
#######
########
########
#########
#########
#########
#########
##########
##########
##########
##########
###########
###########
###########
###########
Caveats:
char
shrinking pass if the input contains sufficiently few non-'a's, say O(sqrt |s|)Perhaps the choice of char shrinker (or its depth) could be given to the end-user of the API (with a good default...), e.g. have the ability to turn off char shrinking completely, which I don't think is currently possible (if you leave it as none you get a default shrinker instead of no shrinker), although I'm not sure how to do that in a compatible way with the current API. Unless there is a special Shrinker.none that the string shrinker detects, so it doesn't even need to explode the string into characters only to find out there is nothing to shrink...
Then perhaps some heuristics could be used based on the length of the string on what kind of char shrinker to use, e.g. we could give the shrinker a certain amount of "fuel", which it can spend as it best sees fit: either for shrinking the length of the string, or once that is exhausted for shrinking characters. Sys.max_string_length of 2^57 means that if length shrinking is logarithmic a fuel of 57 (or some constant multiple of it) can shrink even the theoretically largest string you can ever create. And whatever "fuel" is left over from shrinking the length can be spent on shrinking characters, which would allow a bit more exploration of the state space for short strings (where building all those variants is quicker). Fuel would be consumed based on either spending CPU time processing an item or allocating.
So in that sense length shrinking would consume 'O(1)' fuel for allocation (it calls the allocation function once, for simplicity lets assume that allocating 1MiB is as cheap as allocating 1 byte, which might be even true if it all fits in the minor heap and allocated by the bump allocator), and 'O(n)' processing time, although O(1) function calls to perform that processing. SImilarly exploding the string itself (regardless how many characters are shrunk) would be O(n) allocation initially, followed by potentially O(n) processing time, including O(n) shrinker calls.
There is a balance here on how to measure things, so for simplicity I propose to consume fuel in this way instead:
calling another shrinker would give the fuel to the shrinker and consume at least 1 unit on the invocation itself
calls to stdlib runtime functions may initially be considered to consume 0 fuel, and if needed some constant ratio vs calls to other shrinkers if it is determined experimentally that 0 is not good enough. E.g. String.sub that produces N characters vs calling N char shrinkers would need some measurements on what the actual ratios are, but String.sub is a lot cheaper than N individual char processing functions (e.g. at the very least it will do copying word-by-word, not char-by-char).
And the idea of 'fuel' is already part of how recursive data structures are generated, so why not use that idea for shrinking as well? (especially that in QCheck2 it looks like shrinking works on a tree-like structure and it can probably track fuel internally without having to change the 'set_shrink' API ...).
For QCheck version 1 fixing this in a compatible way might be more tricky (e.g. ?n:int -> 'a -> 'a Iter.t
) although I'm not entirely sure whether that would break existing code that wouldn't know about the fuel.
Then small values would do more exhaustive shrinking, and large values (that cannot be further shrinked in length) would perform more limited shrinking.
What do you think?
Perhaps the choice of char shrinker (or its depth) could be given to the end-user of the API (with a good default...), e.g. have the ability to turn off char shrinking completely, which I don't think is currently possible (if you leave it as none you get a default shrinker instead of no shrinker), although I'm not sure how to do that in a compatible way with the current API.
You can use
Shrink.string ~shrink:Shrink.nil
as I do above in the second plot. Plots 2 and 3 were meant to illustrate the difference between shrinking strings with and without char/element reductions.
I also realize now that my message above was unclear:
I am measuring and reasoning about the complexity of the shrinkers in terms of the emitted shrinking candidates (the plots are printing a '#'
for each candidate). This was not at all clear from my message in hinsight.
From having spent time collecting the shrinker benchmarks in #177 and later trying to improve the QCheck(1) shrinker performance, allocations and processing aren't as dominating as the number of emitted shrinking candidates. Also, O(log n) shrinking candidates just compose a lot better, when put inside a list, which is again inside tuples (or function graphs underlying the function generator and shrinker), ...
[...] Then small values would do more exhaustive shrinking, and large values (that cannot be further shrinked in length) would perform more limited shrinking.
What do you think?
It sounds intriguing - but also ambitious!
I have to admit that I currently have limited time for QCheck work.
So it is on a by-need-basis - as here - when it turns out that too eager string
shrinker is hurting us.
As the new
QCheck.Shrink.string
unit tests document, on a failure path (going through all shrinking candidates without luck) the current approach may be a bit too exhaustive: https://github.com/c-cube/qcheck/blob/f37621a4763d556f05e35cefe81d4e29b932be25/test/core/QCheck_unit_tests.ml#L93-L103By falling back on
Shrink.list_spine
we get a nice and short candidate sequence when we stay clear of element reduction:However, exhaustively trying to reduce all O(n) entries, say O(log n) times, brings this to O(n log n) which may be a bit much:
We are getting bitten by this in ocaml-multicore/multicoretests, as remarked by @edwintorok here: https://github.com/ocaml-multicore/multicoretests/pull/329#issuecomment-1522990777
As an alternative to avoiding character-shrinks above a certain threshold, one could consider alternative approaches, e.g.
'a'
charchar
s simultaneously"zzzzz"
-> "aaazz"