Closed vi closed 4 years ago
I've often thought of having something like this, but in practice, almost every Arbitrary
impl I write needs to encode some key invariants around the values being generated or shrunk. This would defeat the use of a plugin to do any automatic derivation.
Popping up a level, I'm a little down on plugins since their stabilization future is probably still a far way off. As such, I personally probably won't work on something like this any time soon.
Bumping this since proc-macros are getting stabilised in a week or two, and a derive-macro is very easy to implement.
This would be potentially very useful for generating Arbitrary
implementations for simple structs without complex invariants.
If automatic derivation is problematic, then maybe a generator that wrote a proposed Arbitrary
implementation, which the developer could paste into their code and customize?
Macros 1.1 now allows custom derive
in stable (as of 1.15). This would involve a separate library named something like quickcheck_derive
. The main quickcheck
library could remain compatible with an older version like 1.10, and you'd only need 1.15 if you wanted to use #[derive(Arbitrary)]
.
I think the key puzzle is how to handle additional constraints, such as the ones I described in #163. Maybe we should try brainstorming what these constraints look like in typical cases. Here's my example:
#[derive(Arbitrary, Clone)]
pub struct Rect {
left: usize,
top: usize,
width: usize,
height: usize,
}
Constraints: left + width
and top + bottom
both fit in a usize
.
We can actually add attributes like #[arbitrary = "..."]
or #[constraint = "...code..."]
if we wish, and use them to control the code we generate (much like serde
does). But I think it would work better to collect more examples of constraints before trying to design the actual syntax.
Big research dump ahead!
This discussion has links to some prior art and research. There's also an interesting discussion of the tradeoffs involved in default instance distribution, although this mostly seems to affect the size distribution of recursive data structures such as lists (that are considerably rarer in Rust than Haskell). The two papers they mention can be found here and here. While these papers look interesting, I'm not sure if they're directly relevant.
So there are some reasons why #[derive(Arbitrary)]
seems slightly sketchy in Haskell, but these reasons don't seem to be especially relevant to Rust.
Another question is the relationship between #[derive(Arbitrary)]
and instance invariants. For example, my Rect
type should never have left + width
large enough to overflow usize
, regardless of whether it was created by Arbitrary
or by a new
method—or by updating it with a &mut self
method. This could be represented by a trait Invariant
with a method that checks whether the invariant holds. (There are sketches of related ideas here (design by contract) and here (a wrapper type which enforces restrictions on a wrapped type), but neither is mature enough for what we need here.)
So there are a bunch of ways to tackle this, but it's hard to proceed without more examples. So let's dig a bit:
helper::reject
function which is used to filter out individual field values that fail constraints. Here's another example. (There are also quite a few simpler impls that don't need constraints.)Here's a code snippet from nalgebra
demonstrating reject
:
#[cfg(feature="arbitrary")]
impl<N, S> Arbitrary for PerspectiveBase<N, S>
where N: Real + Arbitrary,
S: OwnedStorage<N, U4, U4> + Send,
S::Alloc: OwnedAllocator<N, U4, U4, S> {
fn arbitrary<G: Gen>(g: &mut G) -> Self {
let znear = Arbitrary::arbitrary(g);
let zfar = helper::reject(g, |&x: &N| !(x - znear).is_zero());
let aspect = helper::reject(g, |&x: &N| !x.is_zero());
Self::new(aspect, Arbitrary::arbitrary(g), znear, zfar)
}
}
So custom Arbitrary
impls seem to fall into several categories:
reject
to meet various constraints.I think it would be both desirable and possible to automate (1) somehow, but that's a separate discussion. I'm interested in (2) and (3). I could imagine a syntax like:
#[derive(Arbitrary, Clone)]
pub struct Rect {
left: usize,
top: usize,
#[arbitrary(constraint = "left.checked_add(width).is_some()")]
width: usize,
#[arbitrary(constraint = "top.checked_add(height).is_some()")]
height: usize,
}
A naive implementation might write:
let left = g.gen();
let width = g.gen_constrained(|width| left.checked_add(width).is_some());
let top = g.gen();
let height = g.gen_constrained(|height| top.checked_add(height).is_some());
Rect { left: left, width: width, top: top, height: height }
...but this would tend to fail if left
is picked too close to 2^64-1, unless width
generates 0 values fairly often. So we might want (not actually valid code; it needs to continue
somewhere):
for _ in 0..MAX_TRIES {
let left = g.gen();
let width = g.gen_constrained(|width| left.checked_add(width).is_some());
let top = g.gen();
let height = g.gen_constrained(|height| top.checked_add(height).is_some());
return Rect { left: left, width: width, top: top, height: height };
}
panic!("could not generate instance of Rect; check your constraints");
Anyway, this is just a very rough sketch. But it would work for the examples of cases (2) and (3) that I found, and it wouldn't be especially hard to implement on stable Rust. What are some possible ways we could tweak this to make it better?
I'm gonna take a swing at this without constraints; I'll post here if I get it working.
https://github.com/remexre/quickcheck_derive seems to be working? If people want to try it (and rate my first proc-macro), feel free. I'll probably move it into this repo's workspace and PR once I've implemented constraints and shrinking.
edit: Use https://github.com/remexre/quickcheck/tree/derive instead, I'm planning to PR soon
What should the behavior be for trying to derive Arbitrary on an empty enum? It doesn't really make sense to do, so maybe it should be disallowed? But I might want to have e.g. an Arbitrary Either<A, Void>
or another generic enum that contains a Void
.
Right now, it results in a run-time panic, which is probably the least optimal solution.
Also, I've done a bit of work on constraints, and it's not been great. I'd actually like to see Shrinkable
as a separate trait, and valid(&Self) -> bool
(with a default impl of always-true) on the Arbitrary
trait; I don't think there's a reasonable/sane way to allow the user to specify one function of a derive for shrinking, and the checking for constraints is essentially orthogonal to generating the values, and the logic gets thorny when I try to put them in one function.
I'm going to push what I've got (as https://github.com/remexre/quickcheck_derive/tree/constraints-dead-end), then work on a the above approach.
If anyone has better ideas, certainly share them.
Maybe to handle empty enums properly, Arbitrary
API
fn arbitrary<G: Gen>(g: &mut G) -> Self;
should be changed to return Option<Self>
or Vec<Self>
?
We're not changing the API of the Arbitrary
trait to accommodate a niche corner case like void types. Panicing seems fine to me, since if that panic occurs, there's a bug somewhere, no?
@remexre Not sure what to do about Shrinkable
because I don't really understand the problems you're facing. I don't have time to dive into your code, so if you could describe the problem here I might be able to give feedback. :-)
I don't see trying to test an enum containing a Void (possibly because of generic parameters) as a bug in the user's code per se, but I agree that it probably doesn't merit an API change. The problem is that the code gets a lot more complex when I'm trying to figure out whether an enum branch is possible or not as I'm doing codegen. I'm leaning towards it being a compile-time error for now; the panic is from rand, not quickcheck, so it's a bit bad UX.
The problem with Shrinkable
is that I can forsee users wanting custom shrink()
implementations, but still wanting to derive arbitrary()
. For example, num::Complex
and num::rational::Ratio
probably want different shrink()
implementations, despite having essentially the same internal layout (a struct of two T's). Working around this with an attribute #[arbitrary(shrink_with = "my_custom_function")]
probably would be possible, but a bit ugly, I suppose.
Splitting Shrinkable
into a different trait, though, would let users impl Shrinkable
separately from the derive, and a separate derive for Shrinkable
could be added for those who want the default behavior.
@remexre RE shrinkable: That's interesting. I grant that the use case is valid. I wonder how often that's required though, and when it is, I imagine writing out the Arbitrary
impl wouldn't be that onerous. Generating values tends to be much easier than shrinking in my experience. That is, as a user, I probably want to derive Arbitrary
to get automatic shrinking rather than automatic generation.
But, I am open to splitting the trait, but I'm kind of leaning towards "there's not enough justification" presently...
Okay, for now I'll write it so I could add #[arbitrary(shrink_with = "my_custom_function")]
later if need be. I also totally forgot that you could have nested functions, so I can use that for valid(&self) -> bool
. I should have a basic PR ready by next Sunday, although that depends how my work schedule ends up.
This can probably be closed when #175 gets merged.
I note that there now is a third-party quickcheck_derive
crate, by @panicbit and @llogiq.
@8573 The implementation in #175 looks more feature rich
I want quickcheck to have a compiler plugin that automatically generates Arbitrary impl for structs and enums composed of Arbitrary types, like with serialisation.
That should shrink
to