aisamanra / s-cargot

Elaborate and expressive S-Expression library for Haskell
Other
61 stars 6 forks source link

Trouble parsing arrow notation, -> #14

Open ckoparkar opened 5 years ago

ckoparkar commented 5 years ago

Hi @aisamanra, we're using haskLikeParser to parse s-expressions, and I ran into a problem while trying to parse the arrow notation, ->. Specifically, the parser considers - and > to be two separate atoms. So the parse result is [RSAtom (HSIdent "-"), RSAtom (HSIdent ">")]. Whereas I thought it would be [RSAtom (HSIdent "->")]

pHaskLikeAtom uses parseR5RSIdent to decide what constitutes an atom.

https://github.com/aisamanra/s-cargot/blob/71d2ec896f50877110df8622de3efdccfa6b8d59/Data/SCargot/Common.hs#L49-L54

And sure enough, - is missing from initial. Was this omission intentional ?

I don't have any experience writing parsers to tell if adding - there is something sensible. Is it going to break some corner case of parsing -INTs ? As far as I understand, it's OK wrt the spec ?

I made that change and tried it on a small example which uses ->s, negative numbers and abc-defss and that worked fine. Also, all s-cargot tests still pass, but they're not really using -s I think.

For now, I'm going to use ~> or => or A '-' : A '>' to mean arrows, but I could whip up a PR to make this change. What are your thoughts ?

[EDIT: I apologize, I did not see your GitHub status before I tagged you.]

aisamanra commented 5 years ago

Sorry about the delay in responding here!

The omission of - is intentional, but for pedantic reasons: I wanted to choose a standard identifier syntax that wasn't Haskell's so you could use a typical kebab-case identifiers (i.e. so you could use a haskLikeParser and still be able to write expressions like this-and-that in it), and so I chose R5RS identifiers, which are simpler than the identifiers that appear in some later Schemes: R7RS identifiers, for example, can include strings inside vertical bars (e.g. |this and that| is a single identifier in R7RS). Strictly speaking, -> isn't a valid identifier according to the R5RS spec or the R6RS spec, both of which have special treatment for the standalone identifier - and otherwise only allow hyphens as the non-initial character. (You can see that the definition of parseR5RSIdent heavily mirrors the spec, including the choice of names like peculiar.)

That said, I think it makes sense to allow this! I think for the sake of being pedantic, I'll define a new atom parser called something like parseR5RSIdentLax, so it can specify that it's not a strict implementation of the R5RS identifier rule, and so it can include a comment about the specific way it differs from the spec, and then update haskLikeParser to use that one instead. If you've got a PR on hand that makes the change, I'd be happy to build on that; otherwise, I'll happily put that in and get a new release within the week!

ckoparkar commented 5 years ago

re:why R5Rs - Ah, it makes sense now. That didn't occur to me.

Gosh, I wasn't even looking in the correct place. What I linked to is just a list of identifiers, not the AST! I don't know what I was thinking. Sorry about that, and thanks for pointing me to the right thing.

Now that I know it isn't R5RS at all, I feel strange for requesting this change. Parsing without it isn't bad -- I had to look for [-,>] instead of -> and it worked nicely.

That said, I think it makes sense to allow this!

You must have already considered this, but please only add it if you think it plays nicely with other things :)

re:parseR5RSIdentLax Absolutely! Unfortunately, I don't have anything PR ready. I only messed around a bit with parseR5RSIdent, but that's it.