gallais / idris-tparsec

TParsec - Total Parser Combinators in Idris
https://gallais.github.io/idris-tparsec/
GNU General Public License v3.0
93 stars 10 forks source link

[ fix #35 ] a JSON parser #36

Closed gallais closed 4 years ago

gallais commented 4 years ago

I struggled a lot writing jsonArray before giving up and writing the current version. We need a better way to take something of type

a :-> b :-> c

and turn it into

Box a :-> b :-> Box c

Is it possible in general? What if we make sure all the a, b, c are of type Parser? We ought to be able to somehow prove that if a function is going to be called in a guarded position then it cannot possibly call its arguments in an unguarded position.

gallais commented 4 years ago

Still missing: the object part of the parser. And sprinkling optional spaces in a lot of places, I presume.

gallais commented 4 years ago

Okay, this should do it. All we are missing is an example!

clayrat commented 4 years ago

I'm pretty sure exponents can have explicit +, i.e. 0.0314E+2 should be accepted too, but I'm not sure how to deal with it. Should we accept prefix + in decimalInteger, or replicate that parser with another case for + just in decimalDouble?

gallais commented 4 years ago

I guess I'd be in favour of optand ('+' <|> '-') decimalNat.

clayrat commented 4 years ago

Hmm, I ran some tests at random from https://github.com/nst/JSONTestSuite/ and it seems we have a general problem with u-escaped Unicode:

> json "[\"\\u041f\"]"
Just (JArray [JString "u041f"]) : Maybe JSON

whereas:

> json "[\"П\"]"
Just (JArray [JString "\1055"]) : Maybe JSON

and running a compiled parser:

./jsn "[\"П\"]"
Just ["\u041F"]

Not sure if we should fix it here or leave as a separate feature request :)

gallais commented 4 years ago

Oh, there's even yet another RFC? So far I have seen:

and this github repo mentions

Is that the most recent one?

Anyway, that's a good point: I completely screwed the escaped unicode characters. I think it's worth fixing. I guess we could keep the current one as asciiStringLiteral or something like that. Because not everyone will want to pay the cost of dealing with escaped unicode characters, I presume.

I'll look into it.

clayrat commented 4 years ago

Yes, apparently there are 3 RFCs (actually 4, but one of them is just a typo fix) and 3 additional documents, see the blogpost that accompanies the repo: http://seriot.ch/parsing_json.php#1