Successful parses of partial strings

genovese commented 2 years ago

First off, instaparse is amazing! Thanks!

Second, I'm in a situation where I have a collection of parsers, where the parser being applied is based on a running state. (In principle, the parsers could all be combined, but it is advantageous for several reasons to be able to specify the modular parsers for the non-overlapping states as small and separate entities.) The transitions between the states are signalled by the content of the string. For example, in 'zap .... foo ....' , on seeing foo, we switch to the foo parser from the zap parser, ideally without zap having to know about foo and the other choices that are specified elsewhere. Here foo is guaranteed to be an invalid terminal in zap. Is there an efficient way for zap to parse this partial string and return a success when it sees foo, reserving failures for cases where there is no valid parse tree.

I can simulate this with two passes:

(partial-parse [parser text]
   (let [p (parser text)]
     (if (insta/failure? p)
        (parser (subs text 0 (:index p)))
        p)))

And then use insta/span to check the next point to parse. (This could be in the function but I kept it simple.)

Is there a simpler/more efficient/more robust way to do this in instaparse?

This lies somewhere between :total and :partial that on failure with a valid parse tree in hand returns a successful parse, perhaps with some meta-data indicating partiality.

Thanks

Engelberg commented 2 years ago

It sounds from your description that :total almost gives you what you're looking for, you just don't want that failure node embedded in the tree. Perhaps you can efficiently traverse the result and remove the failure node to get the "valid parse tree" you want.

Other ideas:

pre-process the string, chopping it up ahead of time into the different sections needed by the different parsers.
add to the grammar rules of the zap parser that it ends with a lookahead that matches the beginning of the foo parser, to make the :partial mode more effective for your use case.

If none of those ideas are feasible, I don't offhand see a better alternative to your two-pass approach.

genovese commented 2 years ago

Yes, almost. I was concerned that just removing the failure would leave it unclear (relative to the knowledge the parser has) if the tree was otherwise valid. All your suggestions are helpful, thanks. There are a few complications in practice, but I think I can make it work.

I appreciate your help.

genovese commented 2 years ago

A belated follow-up just for closure. You're right that :total does the job nicely for a partial parse if I can guarantee the existence of a character that cannot occur in the valid string (which I can in my case). I simply add a rule rest = '\0' say and put rest (prefixed by an appropriate separator) at the end of each rule for the start symbol. Parsing with :total parses all the real content correctly with :rest holding the remaining string in a failure node. Only one pass. It's easy to pull out for continuing, and most other failures imply that there is no :rest node which would be last, so it's clear when to look for a real failure. There's one case that's a bit problematic, but I think I can finesse it by checking the spans. Thanks again.

Engelberg / instaparse

Successful parses of partial strings #210