gallais / idris-tparsec

TParsec - Total Parser Combinators in Idris
https://gallais.github.io/idris-tparsec/
GNU General Public License v3.0
93 stars 9 forks source link

Bug in the string literal parser? #42

Open gallais opened 3 years ago

gallais commented 3 years ago

While I was porting the JSON parser to Agda yesterday I think I have found a bug in the string literal parser:

https://github.com/gallais/idris-tparsec/blob/75b288719b9781273a595c294b7d3bed5ea1904d/src/TParsec/Combinators/Chars.idr#L154-L156

After having read a \ we will only accept a u followed by 4 hexadecimal digits aka. a unicode character. Shouldn't we also accept:

gallais commented 3 years ago

Ok this is pretty sneaky: andoptbind anyTok $ \ c => ... means we'll consume any token and succeed even if the continuation fails (in which case we return (c, Nothing)). This means that we accept any character following a \, and only have a special behaviour if it happens to be u aka the start of a unicode character.

Not sure how I feel about that.