djspiewak / sparse

Generalized, incremental parser combinators for scalaz-stream
Apache License 2.0
63 stars 5 forks source link

Sealing Parsers prevents from creating new external parsers #3

Closed mandubian closed 9 years ago

mandubian commented 9 years ago

I wanted to implement a not[Token](p: Parser[Token, Token]) but it's not possible outside of the project itself as everything is sealed for pattern matching.

Is sealed mandatory for parsers (or at least for Incomplete)?

djspiewak commented 9 years ago

I could open up Incomplete without any trouble, since it is effectively defined entirely by derive and complete. However, I should warn you that context-free grammars are not closed under complementation! In other words, a not parser of the form you indicated is not going to be sound, and probably will result in corner cases that infinitely loop (and similar).

What exactly is your use case for complementation? Maybe there's a different way it can be accomplished, or a new combinator that can be added to make things easier.

mandubian commented 9 years ago

Agreed on the potential infinite loops! I wanted to test writing parsers to get the idea of your API. Like EDN string parsing such as EDN string is a "foo" So you would have a rule like:

'"' ~> repeat(not('"')) <~ '"'

Any other way to do that?

djspiewak commented 9 years ago

I would handle that by having a tokenizing step. So, there would be a StrLit token or similar that contains the contents of the string value. I have some code locally which does this. I'll push it up on a branch so you can take a look as soon as I get back home.

djspiewak commented 9 years ago

Here you go: https://github.com/djspiewak/sparse/blob/wip/json-example/src/main/scala/scalaz/stream/parsers/package.scala#L96 Basically, the tokenize function takes a set of rules (of the form Map[Regex, PartialFunction[List[String], Token]]) and produces a Process1 which does the tokenization for you (similar to the parse function). You can see an example of it in action here: https://github.com/djspiewak/sparse/blob/wip/json-example/src/test/scala/scalaz/stream/parsers/JsonStreamSpecs.scala#L55

mandubian commented 9 years ago

Yesterday evening, I was reading your samples and was thinking about tokenization with bufferization too. I was thinking a bit too much about combinators I think ;)

BTW a detail: EDN is cool for streaming as it allows EDN to be a succession of EDN values. In Json, a root {} or [] is mandatory. Json wasn't thought at all for streaming and actually it's a crappy format yet almost universal now ;)

Thanks for your help, I'm going to investigate tokens now!

On Fri, Jan 2, 2015 at 4:06 AM, Daniel Spiewak notifications@github.com wrote:

Here you go: https://github.com/djspiewak/sparse/blob/wip/json-example/src/main/scala/scalaz/stream/parsers/package.scala#L96 Basically, the tokenize function takes a set of rules (of the form Map[Regex, PartialFunction[List[String], Token]]) and produces a `Process1

— Reply to this email directly or view it on GitHub https://github.com/djspiewak/sparse/issues/3#issuecomment-68506774.

djspiewak commented 9 years ago

EDN is far superior to JSON for streaming. :-) I used JSON as an example mostly because I think it's probably the most common use case for something like this, at least in modern hipster servers. When streaming JSON, btw, it's pretty common to define a special "de facto non-standard" format that uses whitespace to delimit JSON tokens at the top level (instead of having a root array), and then normal JSON rules below that. We did this for JSON streaming when I was at Precog, and generally it works out pretty well and has fairly good compatibility across parsers.

In any case, I'm going to close this for now. If you find you need an unsealed Incomplete after all, feel free to reopen!

mandubian commented 9 years ago

I agree with you on everything :) Json is to data format what JS is to programming ;) Le 2 janv. 2015 20:34, "Daniel Spiewak" notifications@github.com a écrit :

Closed #3 https://github.com/djspiewak/sparse/issues/3.

— Reply to this email directly or view it on GitHub https://github.com/djspiewak/sparse/issues/3#event-213531725.