GerHobbelt / jison

bison / YACC / LEX in JavaScript (LALR(1), SLR(1), etc. lexer/parser generator)
https://gerhobbelt.github.io/jison/
MIT License
118 stars 20 forks source link

Use sub part of the parser #65

Open dooxe opened 2 years ago

dooxe commented 2 years ago

Hello !

Let's say we have the following parser grammar :

element
  : mathExpr
  | STRING
  ;

mathExpr
  : NUMBER
  | mathExpr '+' NUMBER
  ;

I'm wondering whether it is possible, inside actions, to use kind of "parseMathExpr(myString)" ?

Thanks

GerHobbelt commented 2 years ago

I haven't worked with the core code for quite some time, so apologies up front for possible faults in my recall.

The short(-ish) answer is no (as in: this is technically possible but recursive invocation of a generated parser may have unexpected side effects and should only be done when you have very special requirements and need environment changes during parse. I cannot think of an example that would make sense right now.)

The longer answer is: why would you do it that way? It would make sense when, for example, you are intent on parsing another QUITE DIFFERENT independent grammar that is encapsulated in that term. Say, for example, you are parsing a script language in grammar A, where you have embedded SQL in string rvalues as grammar B, where you feed the string tokens produced by grammar A parse to your grammar B (SQL in this example) to pull it apart further there. Ergo: this sort of thing is usually only done in very advanced scenarios where other design decisions have resulted in the requirement of having to have two different grammars (maintenance arguments, f.e.)

Your parseMathExpr() suggestion MAY be a sensible design decision when you want to look at expressions embedded in constant string values -- my personal preference as a language designer would however be to see if I could set up the lexer and parser language in such a way that I would have the lexer chop up such strings in series of tokens and then augment the parse grammar such that I can parse any viable expressions hiding in string constants like that without the need of such recursive parse calls. And when such a expression-hiding-in-string parse rule (rule set) fails, take those lexer tokens and glue them together into a single constant string token instead as a cop-out-on-not-having-found-a-viable-subexpr-in-string.

I imagine producing lexer tokens sequences like this one:. OPENQUOTE ID OPERATOR ID COLON CLOSEQUOTE for a sample input like "a + b:", and I would then have think about how I would handle the

OPENQUOTE error CLOSEQUOTE

rule as it's action code would have to produce a basic STRINGCONSTANT from that lexer tokensequence instead. (I would have to really dig into the Jason internals again before I would be able to produce a working example though!)

In this example one could therefore argue sensibly to make the expression-hiding-in-string stuff a separate grammar and invoke that second grammar's parser from within the action code for the STRINGCONSTANT token, either in the lexer or parser phase. This would be a-okay as you would always have a non-recursive parser A action code block calling parser B parser, which would be independent and therefore just act like any other library function call.

Your parseMathExpr() however reads to me like a recursive invocation of (part of) parser A by action code in parser A, which feels.... strange.

In THEORY, I kept reentrance in mind while modifying the jison parser and lexer cores, but from a language design perspective this way of using it still feels strange and immediately raises the question in my brain: why do you do this? Why can't you accomplish the same by making your parser rules themselves capable of handling your needs instead, I.e. why can't what you want be done in a straight (LALR or LR) grammar? Because, when you write it in your grammar rules, you get to keep the speed and power advances of the table based parser. Function calling a (sub)parser as (parser rule action) is way more common in LL style parsers, which are usually stack based (recursive rule parse functions calling one another).

Don't know if this helps you move forward or only raises questions, but my covering answer would therfor be: without knowing specifics, this is generally advised against (unless it's a parser A calling parser B scenario where parser input is a mix of two independent languages which must be parser at the same time). If you absolutely must, I believe it can be done, but be sure to have intimate understanding of the parser core then and have confidence about your awareness of potential risks and side effects, PLUS confidence in your ability to debug and fix any weirdnesses you will encounter along the way.

The parser core is designed to be reentrant and can be invoked recursively (IIRC), but I am hard pressed to imagine a scenario where this is mandated.

HTH

On Wed, Feb 9, 2022, 10:02 dooxe-creative @.***> wrote:

Hello !

Let's say we have the following parser grammar :

element : mathExpr | STRING ;

mathExpr : NUMBER | mathExpr '+' NUMBER ;

I'm wondering whether it is possible, inside actions, to use kind of "parseMathExpr(myString)" ?

Thanks

— Reply to this email directly, view it on GitHub https://github.com/GerHobbelt/jison/issues/65, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADCIHR4AGDKA4ZFHDOTUO3U2IUTJANCNFSM5N43CGJQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

dooxe commented 2 years ago

Thanks a lot for your (very) extensive reply!

I ended up with the following solution inspired from your reply: I have a dedicated parser for mathematical expressions, that are surrounded by a specific character.