Open fegu opened 4 years ago
I'm pretty sure this is because the parser is parsing one Word8 at a time, takeWhile1 isSafe
. The isSafe
function is defined in terms of Data.Char.isControl
, and it so happens that for example 'Ä' is 0xc3 0x84, and 0x84 is considered to be control character.
https://github.com/chrra/iCalendar/blob/master/Text/ICalendar/Parser/Content.hs
This can replicated by this small snippet:
main :: IO ()
main = do
contents <- B.readFile "test"
let Right x = P.parse (map (\c -> (c,isControl c)) <$> P.many P.anyChar) "test" contents
mapM_ print x
With the contents of test
being AÄAaäa
.
> :main
('A',False)
('\195',False)
('\132',True)
('A',False)
('a',False)
('\195',False)
('\164',False)
('a',False)
('\n',True)
Indeed, I was able to mitigate this issue by changing the definition of TextParser in Calendar/Text/ICalendar/Parser/Common.hs
from
type TextParser = P.Parsec ByteString DecodingFunctions
to
import qualified Text.Parsec.Text.Lazy as TP
type TextParser = TP.Parser
and addressing all the type error (mostly just changing ByteString to Text).
I do not know why originally ByteString parser is used in stead of Text parser. So this change might cause the unexpected bugs somewhere else but at least this issue is resolved.
When trying to parse this UTF8 line with the default decoding (also UTF8): ATTENDEE;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE;CN=Øyvind:mailto:oyvind@somedomain.no it fails with
Left (line 24, column 67):\nunexpected \"\152\"\nexpecting \"\r\", \"\n\", ',', ';' or ':'"
The nordic Ø is here correctly UTF8-encoded as \192\152 and it chokes on \152.
My quick fix for now since we don't actually use the names for anything in our application: just search/replace the bytestring first.