Closed shepmaster closed 9 years ago
Yeah, this definitely needs to be fixed. I think the ASCII limitation has been there and nobody has had cause enough to fix it.
There's basically two concerns here. The first is ergonomics. It's really nice when your witnesses are limited to characters that are likely to be meaningful in your terminal (i.e., printable characters). The second is that, of course, strings outside the ASCII range should be tested. When I first wrote quickcheck, I didn't know how to balance these so I just chose to address the first concern.
It seems like the principle of least surprise should apply here: String
should be fuzzed with the set of Unicode scalar values and some other type (perhaps defined in quickcheck, i.e., AsciiString
) should be more limited when you want a guarantee of nicer witnesses. I think it could deref to &String
, which should mostly provide the ergonomics of a plain String
.
It seems like shrinking could also come into play. I could conceive of a string with only ascii as being "smaller" than one with ascii + ascii punctuation. Then it could "grow" to include more common unicode, then "grow" towards uncommon. Just thinking outloud, really.
Has this changed? I think I saw wildly different results than this.
Just as a data point, this is what I ended up using for a string search testing / fuzzing:
#[derive(Clone, Debug)]
struct Text(String);
//static ALPHABET: &'static str = "ABCD";
static ALPHABET: &'static str = "\0\u{1}\u{2}\u{3}";
impl Arbitrary for Text {
fn arbitrary<G: qc::Gen>(g: &mut G) -> Self {
let len = u16::arbitrary(g);
let mut s = String::with_capacity(len as usize);
for _ in 0..len {
let i = usize::arbitrary(g);
let i = i % ALPHABET.len();
s.push(ALPHABET.as_bytes()[i] as char)
}
Text(s)
}
fn shrink(&self) -> Box<Iterator<Item=Self>> {
Box::new(self.0.shrink().map(Text))
}
}
Since the algorithm was working on just the byte slice, I switched to an alphabet of 0-4 for even easier to read intermediate debug outputs..
@bluss it appears so! This could be considered closed via 848acee19b180b44f7cf1d74ac431bd68efd8667.
Errm, right, I forgot to update this issue, but I grew tired of the ASCII-only behavior so I just changed it.
ASCII-only is still useful though, so we might want an AsciiString
type or something.
Right now,
String
s only have ASCII alphanumeric characters. I tried to use it to generate strings with spaces in it and was surprised when my code was never called :smiley_cat:It would be cool if there were more of the Unicode set included. Even awesomer might be if we had some tuning knobs where we could say "more punctuation, less letters", perhaps using the Unicode character classes?
Thanks for the fun-to-use library!