Closed elliot-u410 closed 1 year ago
As an example, this is what I ended up writing to generate random file paths:
func genFilePath() *rapid.Generator[string] {
pathCharacter := rapid.Rune().Filter(func(r rune) bool { return !lo.Contains([]rune{filepath.Separator, 0}, r) })
pathComponent := rapid.Transform(rapid.SliceOfN(pathCharacter, 0, 50), func(x []rune) string { return string(x) })
return rapid.Transform(rapid.SliceOfN(pathComponent, 0, 20), func(x []string) string { return filepath.Join(x...) })
}
I only encountered this with StringMatching
. I assumed that was built on top of the other String generators but that may not be true.
Ok I constructed a test case that shows the differences I'm seeing. I still don't understand why this is happening and it's quite possible an error on my part:
func TestStringMatching(testing *testing.T) {
rapid.Check(testing, func(t *rapid.T) {
pathCharacter := rapid.Rune().Filter(func(r rune) bool { return !lo.Contains([]rune{filepath.Separator, 0}, r) })
str := rapid.Transform(rapid.SliceOfN(pathCharacter, 0, 50), func(x []rune) string { return string(x) }).Draw(t, "file")
assert.Assert(t, utf8.ValidString(str))
dir := testing.TempDir()
_, err := os.OpenFile(filepath.Join(dir, str), os.O_CREATE, 0700)
assert.NilError(t, err)
})
rapid.Check(testing, func(t *rapid.T) {
str := rapid.StringMatching("[^/\x00]+").Draw(t, "string")
assert.Assert(t, utf8.ValidString(str))
dir := testing.TempDir()
_, err := os.OpenFile(filepath.Join(dir, str), os.O_CREATE, 0700)
assert.NilError(t, err)
})
}
This fails in the second scenario with an error like
assertion failed: error is not nil: open /var/folders/c6/zs7j5_n171g9vhw3sgkm0pzm0000gn/T/TestStringMatching955838286/1673/: illegal byte sequence
We can verify that StringMatching
is behaving as expected:
func TestStringMatchingIssue47(t *testing.T) {
Check(t, func(t *T) {
s := StringMatching("[^/\x00]+").Draw(t, "s")
if !utf8.ValidString(s) {
t.Fatalf("not a valid UTF-8 string: %q", s)
}
if strings.ContainsAny(s, "/\x00") {
t.Fatalf("%q contains / or NUL", s)
}
})
}
> go test -v -rapid.checks=10_000_000 -run=47
=== RUN TestStringMatchingIssue47
strings_external_test.go:22: [rapid] OK, passed 10000000 tests (21.841782396s)
--- PASS: TestStringMatchingIssue47 (21.84s)
PASS
ok pgregory.net/rapid 21.846s
It is probably the case that os.OpenFile
on your operating system has more restrictions on file names than any non-NUL non-slash UTF-8 bytes. If you run your first test long enough, it will probably fail, too -- I see no major difference in what they should generate.
Through some testing I can confirm that StringMatch
has a much higher probability of producing unique samples than does the filtered-rune approach. Sampling 10,000,000 runes, the filtered-rune approach only produced 100,000 unique examples whereas StringMatch produced 800,000.
Did you find out what are the restrictions on the valid filenames on your OS? Would be interesting to know.
I am on APFS and according to my research it allows all Unicode symbols (even /
and NUL). According to some sources, some kernel APIs do not allow file names with those two symbols, hence why I filtered them out. I did try running as many tests against the filtered-rune approach with OpenFile
. It never found a failure case but it crashed after about 30 minutes for an unrelated reason. However, based on my above test, it may be that I am just testing the same 100,000 runes over and over and not reaching the ones that would cause failure.
This is either a bug or a feature. If you view a
string
generator asSliceOf(Byte).Transform(string)
then it's correct and probably finds strange edge cases in functions that should be fixed. If instead you view thestring
generator asSliceOf(Rune).Transform(string)
then it's producingstring
s that are not UTF-8 compatible. I haven't dug too deeply into this. I just know that I had to start building my own string generators for this reason.