Bogdanp / rackcheck

A property-based testing library for Racket.
29 stars 7 forks source link

Streams aren't a general enough abstraction for efficient shrinking #5

Closed laelath closed 2 years ago

laelath commented 3 years ago

I'm writing a generator for program expressions, which means that the number of possible shrinks for every statement grows exponentially with the height of the program. When shrinking something like an if statement you want to shrink on the predicate, shrink either or both of the branches, or produce all the shrinks of the branches. It's possible to prune many shrinks by assuming that if one shrink passes, all the shrinks of that shrink will (i.e. if the shrink to just one of the branch passes, then we can ignore all the shrinks of that branch).

The only way I can think of to do this currently would be to pass the property being checked into the generator so it could use it for manually shrinking the expression, which is flagrantly breaking the abstraction.

Another quirk with the streams is that the only way to get a sensible ordering of the shrinks is to sort based on some expression size function, but doing that is way too costly for the number of shrinks that are produced.

Bogdanp commented 3 years ago

Can you share your generator so I have something to play around with?

laelath commented 3 years ago

Here it is, it's generating a simple subset of scheme, guaranteeing there will be no runtime type errors. I can create a smaller one with similar logic if needed.

#lang racket

(require racket/random rackcheck)

(struct TAny () #:prefab)
(struct TBot () #:prefab)

(struct TInt () #:prefab)
(struct TBool () #:prefab)

;; Note: this only exists right now for future compatibility
;;       function types are not yet a thing
(struct TFun (domain codomain) #:prefab)

(define prim-env
  (list (cons 'add1 (TFun (TInt) (TInt)))
        (cons 'sub1 (TFun (TInt) (TInt)))
        (cons 'zero? (TFun (TInt) (TBool)))
        (cons 'abs (TFun (TInt) (TInt)))
        (cons '- (TFun (TInt) (TInt)))
        (cons 'not (TFun (TAny) (TBool)))))

(define (do-prim p v)
  (match p
    ['add1 (add1 v)]
    ['sub1 (sub1 v)]
    ['zero? (zero? v)]
    ['abs (abs v)]
    ['- (- v)]
    ['not (not v)]))
#lang racket

(require racket/random rackcheck)

;; Produces true if t1 subsumes t2
(define (type-subsumes? t1 t2)
  (match* (t1 t2)
    [((TAny) _) #t]
    [(_ (TBot)) #t]
    [(t t) #t]
    [((TFun d1 c1) (TFun d2 c2))
     (and (type-subsumes? d2 d1)
          (type-subsumes? c1 c2))]
    [(_ _) #f]))

(define (base-type? t)
  (match t
    [(TInt) #t]
    [(TBool) #t]
    [_ #f]))

(define (value? e)
  (or (boolean? e) (number? e)))

(define (filter-env f env)
  (filter (λ (m) (f (cdr m))) env))

(define gen:base-type (gen:choice (gen:const (TInt)) (gen:const (TBool))))

(define (gen:val type)
  (gen:no-shrink
   (match type
     [(TAny) (gen:bind gen:base-type gen:val)]
     [(TInt) (gen:sized (λ (n) (gen:integer-in (- -4 n) (+ 4 n))))]
     [(TBool) gen:boolean]
     [(TFun _ _) (error "Cannot generate values of function types (yet)")]
     [(TBot) (error "wat")])))

(define (gen:simple-expr env type)
  (let ([env-candidates (map car (filter-env (λ (t) (and (base-type? t)
                                                         (type-subsumes? type t)))
                                             env))])
    (if (empty? env-candidates)
        (gen:val type)
        (gen:choice (gen:val type) (gen:one-of env-candidates)))))

(define (gen:prim-app env codomain)
  (make-gen
   (λ (rng size)
     ;; for now applications are primops with a single argument
     ;; this always works assuming gen-app is always called with TInt or TBool
     (let* ([app-candidates (filter-env (curry type-subsumes? (TFun (TBot) codomain)) env)]
            [m (random-ref app-candidates rng)]
            [e-stream ((gen:expr env (TFun-domain (cdr m))) rng (sub1 size))])
       (for*/stream ([try-compute? '(#f #t)]
                     [e e-stream]
                     #:when (or (not try-compute?) (value? e)))
         (if try-compute?
             (do-prim (car m) e)
             `(,(car m) ,e)))))))

(define (gen:if env type)
  (make-gen
   (λ (rng size)
     (let ([e-pred-stream ((gen:expr env (TAny)) rng (quotient size 5))]
           [e-then-stream ((gen:expr env type) rng (quotient (* 2 size) 5))]
           [e-else-stream ((gen:expr env type) rng (quotient (* 2 size) 5))])
       (stream-append
        ;; Not complete, but way faster
        (for/stream ([e-pred e-pred-stream]
                     [e-then e-then-stream]
                     [e-else e-else-stream])
          `(if ,e-pred ,e-then ,e-else))
        e-then-stream
        e-else-stream)))))

;; Env Type natural -> (generator S-Expr)
(define (gen:expr env type)
  (gen:sized
   (λ (size)
     (if (zero? size)
         (gen:simple-expr env type)
         (gen:frequency
          (list (cons 1 (gen:simple-expr env type))
                (cons size (gen:prim-app env type))
                (cons size (gen:if env type))))))))

(define (shrink-expr e)
  (match e
    [(? value? v) empty]
    [(list (? symbol? p) e)
     (append
      (map (λ (s) (list p s)) (shrink-expr e))
      (if (value? e)
          (list (do-prim p e))
          empty))]
    [`(if ,e1 ,e2 ,e3)
     (append
      (list e2 e3)
      (map (λ (s) `(if ,s ,e2 ,e3)) (shrink-expr e1))
      (map (λ (s) `(if ,e1 ,s ,e3)) (shrink-expr e2))
      (map (λ (s) `(if ,e1 ,e2 ,s)) (shrink-expr e3)))]))

The base generator is (gen:expr prim-env (TAny)).

laelath commented 3 years ago

The for/stream in gen:if was originally a for*/stream, but the run time exploded when trying to shrink larger programs. This method is a decent approximation, but there occasionally are programs that it can't shrink, or doesn't shrink enough.

laelath commented 3 years ago

I've been trying to think of possible alternatives, one would be to have generators produces lazy trees instead of streams, which could then be checked and descended QuickCheck style, this should allow composition of shrinking with the composition of generator combinators, but based on some experimenting, I've gotten very stuck on implementing bind.

Another option would be to have shrinking done by a function that is passed into property (or rather into each id binding), this would be the simplest but also the most painful from a basic user perspective since all default shrinking would be lost. Some could be gained back by implementing a (giant, nasty) generic shrinking function, which if you squint kinda looks like what QuickCheck does with the default instances of the Arbitrary type class.

The final alternative I had was to have the generator structure hold a shrinking function, but I'm pretty sure that can't work due to combinators like gen:frequency and gen:one-of, since the function wouldn't have the knowledge of what choice was taken during generation.

Of course the current shrinking algorithm does work perfectly fine for most simple cases if you feel like any structural shrinking algorithm would be too complicated/require too complex of an interface.

Bogdanp commented 3 years ago

Thanks for sharing your thoughts! I haven't yet had the time to look into this and I haven't thought about the implementation deeply in a long time, but the things you listed ring a bell. In particular, I remember that I initially started with the shrinking function approach (your "final alternative") and backtracked to streams, but I don't remember why. Possibly because streams felt simple for most cases, even though I remember being aware that things like sorting and pruning would be problematic. I'll try to take a deeper look at this this weekend or the next, but one thought I have now is that maybe a hybrid approach could somehow work: keep streams around and support generators that have fsm-style shrinking functions w/in them. That would also be nice for preserving backwards-compatibility.

laelath commented 3 years ago

I've been thinking and experimenting with things since this has been a fun problem, and I got a version of my first idea to work (well, finally figured out bind), I'm working on updating some of the generator combinators to use the new system to get a proof-of-concept, but the general idea is that the generator produces lazy trees, but the interface provided to create them is by providing a shrink function and a start value, to avoid the mind-bending reasoning about actually constructing those directly. Unfortunately this approach would likely not be backwards compatible with custom generators, (an interface between the lazy trees and streams could be made, but I'm not sure how to best order them, since the trees are constructed assuming non-backtracking depth first search), but should be backwards compatible with generators built purely from generator combinators.

laelath commented 3 years ago

I've pushed a proof of concept to laelath/rackcheck. It mostly works quite well with a few things missing, namely gen:filter and the gen:hashs. Also I think the lists could be done better, I basically transliterated the code from QuickCheck.

laelath commented 3 years ago

The quirks of this that I have found are that defining shrink functions that recursively shrink their arguments is a bit weird, since you need to create a tree of shrink trees, then map over that with a function that converts the trees into concrete values. Maybe there's a clever way to hide the internal details and pull out a magic shrink "function", but I haven't been able to think of anything so far. And unfortunately this seems somewhat often necessary since using bind to do this chooses an default ordering for all binds (as it needs to), which can produce unusual shrink trees if the secondary generator just does something like cons an element on to the result of the first.

laelath commented 3 years ago

I've added a gen:with-shrink that replaces the shrinks with the recursive application of a custom shrinking function as an escape hatch that feels like it fits in elegantly with the rest of the solution.