Closed maxbla closed 3 years ago
For the purpose of semantic versioning, this PR (or any like it) should be considered a breaking change, as code may have (unwittingly) depended on the fact that integers were in -100..=100
.
as code may have (unwittingly) depended on the fact that integers were in -100..=100.
There is also code wittingly using usize::arbitrary()
to size data structures.
@bluss yep, intentional dependence on $integer_type::arbitrary()
to limit the range to gen::size()
is a real backwards compatibility problem for this PR (which must consequently come with a minor semver increment).
This PR adds one workaround (for now) in this crate, for a type that clearly needs it. std::time::Duration
's Arbitrary
implementation depends on u64
and u32
's Arbitrary
. the workaround is just _ % gen.size()
(see commit abdd617) (also I think it should be using gen.gen_range()
instead).
There is another category of types, including std::net::Ipv4Addr
, where I'm not sure what the behavior should be -- what is the size of an ipv4 address? I think its not entirely clear and I'm leaning towards IpAddr
ignoring gen::size()
completely. note: the previous behavior was to respect gen:size() for each individual u8
that goes into the constructor of the ipv4 (or u16
/ipv6 address), so 101.101.101.101
through 255.255.255.255
would not be selected with the default gen::size() of 100.
A third category is for types that clearly should ignore gen::size()
, including all integer types, when they are called directly.
For crates that define a type, implement Arbitrary
on that type, and depend on integer types' implementation of Arbitrary
to size
their own type, the situation is not great. Those crates will have to make a decision to either use gen.gen_range(0, gen.size())
(or equivalent) or use the new behavior by default (probably bumping their semver).
gen::size()
is necessary is for types whose memory or CPU usage is dependent on gen::size()
(like vec) and it is a good api for types that are overwhelmingly used for small values (Duration
). Any types outside these problem categories could be free to disregard gen::size()
. This is a substantial change, requiring documentation and possibly marker traits.
If you want to show me some code I'm breaking to give me a little perspective, I'm all ears.
Good point about the _ % gen.size()
workaround, although the renewed focus on problem values sounds like it is a problem for this (size is weirdly distributed then).
I have no special reason to share this petgraph code (that should be trivial to port), but it uses usize::arbitrary
in a now-misguided way, and can easily migrate to using .gen_range()
.
Though I think that in the long run, I would prefer if users of quickcheck used quickcheck
as the only source of randomness and randomness API (maybe it's called arbitrariness for quickcheck). Reexporting rand is a valid way to do that, but that's just one of the ways.
To me it would make sense that this is more quickcheck specific. For example, .gen_size_range()
gives you an *arbitrary size compatible with current limits. Then we have quickcheck-controlled sizes and don't have to hardcode them as uniformly distributed, for example.
Yeah, _ % gen.size()
isn't perfect because of special values. It also isn't great because it can mess with the distribution. For example with u8 and when gen.size()
is 100
, each number in 0..55
has a greater chance to be picked than one in 56..99
(3/255 vs 2/255). Of course, this difference approaches zero as the range of the underlying type is much bigger than gen.size()
.
I agree that third party crates should be able to depend only on Quickcheck for randomness (at least in their respective implementations of Arbitrary
) (especially since we're considering dropping rand
#241 ), but I'm unsure of the best way to achieve that goal.
It would make sense to wrap gen.gen_range()
in some way, but that way can't involve adding requirements to Arbitrary
(like RngCore
). I could see there being another trait like RawArbitrary
that generates arbitrary values without special values and requires explicit sizing, but this feels a bit over-engineered (and more difficult to maintain). Is it possible that the extra weirdness from special/problem values doesn't really matter and we can just ignore it, suggesting people use the _ % gen.size()
trick (or similar) to fit their needs?
While formulating a previous response, I looked through the types that Arbitrary is implemented on in this crate and realized I might have some issues
type | issue | solution |
---|---|---|
f32 /f64 |
still respects gen.size() | future PR to address desired behavior of floats |
Range<T> /Bound<T> |
Not sure if it should respect gen::size() for integer types | wait and think |
SocketAddrV6 |
I don't know what flowinfo or scope_id are in constructor. Is it legal for them to take any value? seems like yes, but unsure |
wait for review |
@BurntSushi and/or @bluss I would appreciate guidance on what should happen to Range<T>
/Bound<T>
(when T is some integer type)
The basic options are
gen::size()
(current behavior, easy)gen::size()
% gen.size()
trick won't work unless I add additional trait bound of std::ops::Rem
to Range<T>
. Most uses of RangeRange<Ipv4Addr>
would no longer have an arbitrary implementation)gen.size()
, involving lots of macrosFor Range<T>
My thinking was that since it was often used as the index for a slice, it might break code that contains something like array[Arbitrary::<Range<usize>>()]
, and there would be no perfect way for a user to fix that (and still have a uniform distribution). I think relying on the underlying type makes the most sense, but I understand how this might be a pain point for some users.
For SocketAddrV6
I don't know what range of values it is allowed to take (if all values are valid, that's good, b/c that's how this PR works right now)
Interesting. I personally find array[Arbitrary::<Range<usize>>()]
to be a bit weird. Namely, it seems just as weird as array[Arbitrary::<usize>()]
, which would similarly fail to work. I think if we're okay squashing the latter, then we should be okay squashing the former. (I'd really expect someone to write array[Arbitrary::<usize>() % array.len()]
instead I think. And one could do the same with ranges with a helper function.
As for SocketAddrV6
, I don't see any documented restrictions, so it seems fine to me!
It would be a little crazy to ignore size for integers but not for floats!
Yeah, it would. I'll take a crack at it tomorrow. It seems like a fairly quick fix.
It seems like a fairly quick fix.
I take it back.
I experimented a bit, but I think to match user expectations, the distribution needs to generate numbers in ..., (-100, -10), (-10, -1), (-1, 0), (0,1), (1,10), (10,100), ... with equal probability. One distribution with this property is the log-uniform distribution. I couldn't find a crate with this distribution, but dependencies are bad and it was easy to fudge.
At this point, I have addressed all the review I received so far.
This PR now fixes #27
Are you waiting for me to resolve merge conflicts with a merge commit or rebase?
Or is this waiting on @bluss? Whether it is or not, I think the float code could do with review.
Please don't wait for me :) Thanks for working on good solutions
@BurntSushi This PR is ready to be merged. I'm curious what you think of the log-uniform distribution for floats.
Is there a published fork with this bug fixed? It's a deal-breaker.
In the meantime, to those looking for a workaround, I advice to do something like this:
impl Arbitrary for Pos {
fn arbitrary<G : Gen>(g : &mut G) -> Pos {
let workaround_seed : u8 = random();
if workaround_seed % 10 == 1 {
Pos(i8::MAX, i8::MAX)
} else if workaround_seed % 10 == 0 {
Pos(i8::MIN, i8::MIN)
} else {
Pos(i8::arbitrary(g), i8::arbitrary(g))
}
}
}
@cognivore this PR is being made from my personal fork, so you could use that for now, but I don't advise it. I have no plans to maintain a fork of quickcheck and I will probably delete my fork soon after this PR is merged (but may create a new fork to make additional changes).
Regarding your workaround, if you're using quickcheck internally i.e. not exposing a struct that implements Arbitrary in the public API of a library, the following is nicer workaround in my opinion. Be careful not to use this workaround if your test function takes a Vec<T>
as a parameter, as arbitrary will then generate the maximum length Vec
s, eating all your memory.
let gen = StdThreadGen::new(usize::max_value());
let mut qc = QuickCheck::with_gen(gen);
qc.quickcheck(test_function as fn(u8, u8) -> TestResult);
Any movement on this? I'd be happy to help get this merged if need be.
Also interested in what's the status of this, is anything blocking merging it in? (3 out of 10 open issues are also directly related to this)
All right, I've managed to bring this into my rollup branch. Note that for the future, when submitting reasonably small patches like this, please keep them to a single commit and force push. Having 16 commits in this branch tracking every winding change along with merges of master into the branch made it a real pain to squash it down into one commit. With that said, thank you for sticking with this PR over such a long time period! Apologies for taking a long time to merge it. QuickCheck isn't high on my list of crates to maintain.
Aims to fix #119, #190 and #233. Changes the implementation of
Arbitrary
for numbers to ignore thesize
ofgen
, and instead pick uniformly from the entire range of the type.Todo:
Gen
'ssize()
in documentation