ekmett / parsers

Generic parser combinators
Other
88 stars 38 forks source link

Intermittent QuickCheck test failures #68

Closed RyanGlScott closed 7 years ago

RyanGlScott commented 7 years ago

On rare occasion, the quicktest test suite will fail. Here's an example from a recent Travis build:

Test suite quickcheck: RUNNING...
*** Failed! Falsifiable (after 33 tests): 
attoparsec
'\721424'
'\DLE'

That was on GHC 7.8.4, but the GHC version appears to be unimportant, since I can reproduce the issue on 8.2.1 as well:

*** Failed! Falsifiable (after 100 tests):                   
attoparsec
'\1044548'
'D'
gwils commented 7 years ago

I can reproduce this. Here are my results in case they help:

*** Failed! Falsifiable (after 64 tests): 
attoparsec
'\698397'
'\GS'
*** Failed! Falsifiable (after 91 tests): 
attoparsec
'\1059932'
'\\'
RyanGlScott commented 7 years ago

I'm wondering if this is ultimately due to a ByteString encoding issue. The attoparsec parser in the test suite encodes its input string using Data.ByteString.Char8.pack. However, in all of these failing test cases, it's given a codepoint outside of its range. For example, we can see in GHCi that:

λ> import qualified Data.ByteString.Char8 as B8
λ> B8.pack ['\698397']
"\GS"

Notice that it didn't display '\698397'—it displayed '\GS'! This is significant, since '\698397' and '\GS' were a pair of failing inputs from https://github.com/ekmett/parsers/issues/68#issuecomment-318958068. This makes sense, since if the parser mistakenly believes that '\698397' is equal to '\GS' (after being encoded as a ByteString), then the test will go awry.

I think we can fix the issue by simply using the Text version of parseOnly instead, which appears to dodge this encoding issue. At least, I've tried running the examples in this issue, as well as the QuickCheck tests, with the necessary changes to use Data.Attoparsec.Text, and I haven't ran into any failures yet.