Feature request: Add auto-derivators for structs and enums

vi commented 9 years ago

I want quickcheck to have a compiler plugin that automatically generates Arbitrary impl for structs and enums composed of Arbitrary types, like with serialisation.

That should shrink

extern crate quickcheck;

use quickcheck::Arbitrary;

#[derive(Clone,Debug)]
struct Mumu {
    a : u32,
    b : String,
}

#[derive(Clone,Debug)]
enum Qqq {
    Lol,
    Ror,
    Kokoko(Mumu),
    Totot(u32),
}

impl Arbitrary for Mumu {
    fn arbitrary<G : quickcheck::Gen>(g:&mut G) -> Mumu {
        Mumu {a : Arbitrary::arbitrary(g), b : Arbitrary::arbitrary(g) }
    }
    fn shrink(&self) -> Box<Iterator<Item=Self>> {
        Box::new ( (self.a.clone(), self.b.clone()).shrink().map(|(aa,bb)| Mumu { a: aa, b:bb}))
    }
}

impl Arbitrary for Qqq {
    fn arbitrary<G : quickcheck::Gen>(g:&mut G) -> Qqq {
        let y = g.next_u32() % 4;
        match y {
            0 => Qqq::Lol,
            1 => Qqq::Ror,
            2 => Qqq::Kokoko(Arbitrary::arbitrary(g)),
            3 => Qqq::Totot(Arbitrary::arbitrary(g)),
            _ => panic!(),
        }
    }
    fn shrink(&self) -> Box<Iterator<Item=Self>> {
        match self {
            &Qqq::Totot(ref x) => Box::new(x.shrink().map(|s| Qqq::Totot(s))),
            &Qqq::Kokoko(ref x) => Box::new(x.shrink().map(|s| Qqq::Kokoko(s))),
            _ => quickcheck::empty_shrinker(),
        }
    }
}

#[test]
fn it_works() {
    fn qqq(x : Qqq) -> bool {
        if let Qqq::Kokoko(Mumu { a : v, b: _ }) = x {
            return v != 4
        }
        true
    }

    quickcheck::QuickCheck::new().tests(1000_000_000).quickcheck(qqq as fn(Qqq) -> bool);
}

to

extern crate quickcheck;

use quickcheck::Arbitrary;

#[derive(Clone,Debug,Arbitrary)]
struct Mumu {
    a : u32,
    b : String,
}

#[derive(Clone,Debug,Arbitrary)]
enum Qqq {
    Lol,
    Ror,
    Kokoko(Mumu),
    Totot(u32),
}

#[test]
fn it_works() {
    fn qqq(x : Qqq) -> bool {
        if let Qqq::Kokoko(Mumu { a : v, b: _ }) = x {
            return v != 4
        }
        true
    }

    quickcheck::QuickCheck::new().tests(1000_000_000).quickcheck(qqq as fn(Qqq) -> bool);
}

BurntSushi commented 9 years ago

I've often thought of having something like this, but in practice, almost every Arbitrary impl I write needs to encode some key invariants around the values being generated or shrunk. This would defeat the use of a plugin to do any automatic derivation.

Popping up a level, I'm a little down on plugins since their stabilization future is probably still a far way off. As such, I personally probably won't work on something like this any time soon.

aldanor commented 7 years ago

Bumping this since proc-macros are getting stabilised in a week or two, and a derive-macro is very easy to implement.

This would be potentially very useful for generating Arbitrary implementations for simple structs without complex invariants.

burdges commented 7 years ago

If automatic derivation is problematic, then maybe a generator that wrote a proposed Arbitrary implementation, which the developer could paste into their code and customize?

emk commented 7 years ago

Macros 1.1 now allows custom derive in stable (as of 1.15). This would involve a separate library named something like quickcheck_derive. The main quickcheck library could remain compatible with an older version like 1.10, and you'd only need 1.15 if you wanted to use #[derive(Arbitrary)].

I think the key puzzle is how to handle additional constraints, such as the ones I described in #163. Maybe we should try brainstorming what these constraints look like in typical cases. Here's my example:

#[derive(Arbitrary, Clone)]
pub struct Rect {
    left: usize,
    top: usize,
    width: usize,
    height: usize,
}

Constraints: left + width and top + bottom both fit in a usize.

We can actually add attributes like #[arbitrary = "..."] or #[constraint = "...code..."] if we wish, and use them to control the code we generate (much like serde does). But I think it would work better to collect more examples of constraints before trying to design the actual syntax.

emk commented 7 years ago

Big research dump ahead!

This discussion has links to some prior art and research. There's also an interesting discussion of the tradeoffs involved in default instance distribution, although this mostly seems to affect the size distribution of recursive data structures such as lists (that are considerably rarer in Rust than Haskell). The two papers they mention can be found here and here. While these papers look interesting, I'm not sure if they're directly relevant.

So there are some reasons why #[derive(Arbitrary)] seems slightly sketchy in Haskell, but these reasons don't seem to be especially relevant to Rust.

Another question is the relationship between #[derive(Arbitrary)] and instance invariants. For example, my Rect type should never have left + width large enough to overflow usize, regardless of whether it was created by Arbitrary or by a new method—or by updating it with a &mut self method. This could be represented by a trait Invariant with a method that checks whether the invariant holds. (There are sketches of related ideas here (design by contract) and here (a wrapper type which enforces restrictions on a wrapped type), but neither is mature enough for what we need here.)

So there are a bunch of ways to tackle this, but it's hard to proceed without more examples. So let's dig a bit:

RegexLikeString. This has a bias towards regex characters.
SmallAscii and BiasAscii. More biasing.
Iter. Could be automatically derived.
NonEmptyAscii. More biasing.
PerspectiveBase. This has a very nice helper::reject function which is used to filter out individual field values that fail constraints. Here's another example. (There are also quite a few simpler impls that don't need constraints.)
Small. Reduces the average size by 2.
Graph. Complex and customized, like you'd expect. No way to automate this.
Word. Another trivial case with no contraints.

Here's a code snippet from nalgebra demonstrating reject:

#[cfg(feature="arbitrary")]
impl<N, S> Arbitrary for PerspectiveBase<N, S>
    where N: Real + Arbitrary,
          S: OwnedStorage<N, U4, U4> + Send,
          S::Alloc: OwnedAllocator<N, U4, U4, S> {
    fn arbitrary<G: Gen>(g: &mut G) -> Self {
        let znear  = Arbitrary::arbitrary(g);
        let zfar   = helper::reject(g, |&x: &N| !(x - znear).is_zero());
        let aspect = helper::reject(g, |&x: &N| !x.is_zero());

        Self::new(aspect, Arbitrary::arbitrary(g), znear, zfar)
    }
}

So custom Arbitrary impls seem to fall into several categories:

A biased distribution of an underlying type, typically implemented as a wrapper.
Trivial impls that can be derived with no customization.
Implementations that use per-field filtering using reject to meet various constraints.

I think it would be both desirable and possible to automate (1) somehow, but that's a separate discussion. I'm interested in (2) and (3). I could imagine a syntax like:

#[derive(Arbitrary, Clone)]
pub struct Rect {
    left: usize,
    top: usize,
    #[arbitrary(constraint = "left.checked_add(width).is_some()")]
    width: usize,
    #[arbitrary(constraint = "top.checked_add(height).is_some()")]
    height: usize,
}

A naive implementation might write:

let left = g.gen();
let width = g.gen_constrained(|width| left.checked_add(width).is_some());
let top = g.gen();
let height = g.gen_constrained(|height| top.checked_add(height).is_some());
Rect { left: left, width: width, top: top, height: height }

...but this would tend to fail if left is picked too close to 2^64-1, unless width generates 0 values fairly often. So we might want (not actually valid code; it needs to continue somewhere):

for _ in 0..MAX_TRIES {
    let left = g.gen();
    let width = g.gen_constrained(|width| left.checked_add(width).is_some());
    let top = g.gen();
    let height = g.gen_constrained(|height| top.checked_add(height).is_some());
    return Rect { left: left, width: width, top: top, height: height };
}
panic!("could not generate instance of Rect; check your constraints");

Anyway, this is just a very rough sketch. But it would work for the examples of cases (2) and (3) that I found, and it wouldn't be especially hard to implement on stable Rust. What are some possible ways we could tweak this to make it better?

remexre commented 7 years ago

I'm gonna take a swing at this without constraints; I'll post here if I get it working.

remexre commented 7 years ago

https://github.com/remexre/quickcheck_derive seems to be working? If people want to try it (and rate my first proc-macro), feel free. I'll probably move it into this repo's workspace and PR once I've implemented constraints and shrinking.

edit: Use https://github.com/remexre/quickcheck/tree/derive instead, I'm planning to PR soon

remexre commented 7 years ago

What should the behavior be for trying to derive Arbitrary on an empty enum? It doesn't really make sense to do, so maybe it should be disallowed? But I might want to have e.g. an Arbitrary Either<A, Void> or another generic enum that contains a Void.

Right now, it results in a run-time panic, which is probably the least optimal solution.

remexre commented 7 years ago

Also, I've done a bit of work on constraints, and it's not been great. I'd actually like to see Shrinkable as a separate trait, and valid(&Self) -> bool (with a default impl of always-true) on the Arbitrary trait; I don't think there's a reasonable/sane way to allow the user to specify one function of a derive for shrinking, and the checking for constraints is essentially orthogonal to generating the values, and the logic gets thorny when I try to put them in one function.

I'm going to push what I've got (as https://github.com/remexre/quickcheck_derive/tree/constraints-dead-end), then work on a the above approach.

If anyone has better ideas, certainly share them.

vi commented 7 years ago

Maybe to handle empty enums properly, Arbitrary API

fn arbitrary<G: Gen>(g: &mut G) -> Self;

should be changed to return Option<Self> or Vec<Self>?

BurntSushi commented 7 years ago

We're not changing the API of the Arbitrary trait to accommodate a niche corner case like void types. Panicing seems fine to me, since if that panic occurs, there's a bug somewhere, no?

@remexre Not sure what to do about Shrinkable because I don't really understand the problems you're facing. I don't have time to dive into your code, so if you could describe the problem here I might be able to give feedback. :-)

remexre commented 7 years ago

I don't see trying to test an enum containing a Void (possibly because of generic parameters) as a bug in the user's code per se, but I agree that it probably doesn't merit an API change. The problem is that the code gets a lot more complex when I'm trying to figure out whether an enum branch is possible or not as I'm doing codegen. I'm leaning towards it being a compile-time error for now; the panic is from rand, not quickcheck, so it's a bit bad UX.

The problem with Shrinkable is that I can forsee users wanting custom shrink() implementations, but still wanting to derive arbitrary(). For example, num::Complex and num::rational::Ratio probably want different shrink() implementations, despite having essentially the same internal layout (a struct of two T's). Working around this with an attribute #[arbitrary(shrink_with = "my_custom_function")] probably would be possible, but a bit ugly, I suppose.

Splitting Shrinkable into a different trait, though, would let users impl Shrinkable separately from the derive, and a separate derive for Shrinkable could be added for those who want the default behavior.

BurntSushi commented 7 years ago

@remexre RE shrinkable: That's interesting. I grant that the use case is valid. I wonder how often that's required though, and when it is, I imagine writing out the Arbitrary impl wouldn't be that onerous. Generating values tends to be much easier than shrinking in my experience. That is, as a user, I probably want to derive Arbitrary to get automatic shrinking rather than automatic generation.

But, I am open to splitting the trait, but I'm kind of leaning towards "there's not enough justification" presently...

remexre commented 7 years ago

Okay, for now I'll write it so I could add #[arbitrary(shrink_with = "my_custom_function")] later if need be. I also totally forgot that you could have nested functions, so I can use that for valid(&self) -> bool. I should have a basic PR ready by next Sunday, although that depends how my work schedule ends up.

remexre commented 7 years ago

This can probably be closed when #175 gets merged.

8573 commented 6 years ago

I note that there now is a third-party quickcheck_derive crate, by @panicbit and @llogiq.

panicbit commented 6 years ago

@8573 The implementation in #175 looks more feature rich

BurntSushi commented 4 years ago

https://github.com/BurntSushi/quickcheck/pull/222#issuecomment-573267560

BurntSushi / quickcheck

Feature request: Add auto-derivators for structs and enums #98