Closed RyanGlScott closed 7 years ago
I can reproduce this. Here are my results in case they help:
*** Failed! Falsifiable (after 64 tests):
attoparsec
'\698397'
'\GS'
*** Failed! Falsifiable (after 91 tests):
attoparsec
'\1059932'
'\\'
I'm wondering if this is ultimately due to a ByteString
encoding issue. The attoparsec
parser in the test suite encodes its input string using Data.ByteString.Char8.pack
. However, in all of these failing test cases, it's given a codepoint outside of its range. For example, we can see in GHCi that:
λ> import qualified Data.ByteString.Char8 as B8
λ> B8.pack ['\698397']
"\GS"
Notice that it didn't display '\698397'
—it displayed '\GS'
! This is significant, since '\698397'
and '\GS'
were a pair of failing inputs from https://github.com/ekmett/parsers/issues/68#issuecomment-318958068. This makes sense, since if the parser mistakenly believes that '\698397'
is equal to '\GS'
(after being encoded as a ByteString
), then the test will go awry.
I think we can fix the issue by simply using the Text
version of parseOnly
instead, which appears to dodge this encoding issue. At least, I've tried running the examples in this issue, as well as the QuickCheck
tests, with the necessary changes to use Data.Attoparsec.Text
, and I haven't ran into any failures yet.
On rare occasion, the
quicktest
test suite will fail. Here's an example from a recent Travis build:That was on GHC 7.8.4, but the GHC version appears to be unimportant, since I can reproduce the issue on 8.2.1 as well: