GerHobbelt / jison

bison / YACC / LEX in JavaScript (LALR(1), SLR(1), etc. lexer/parser generator)
https://gerhobbelt.github.io/jison/
MIT License
118 stars 20 forks source link

Result location @$ is undefined in v0.6.0-188 #12

Closed nwhetsell closed 7 years ago

nwhetsell commented 7 years ago

I’m not sure if I’m doing something wrong, but result locations @$ seem to be always undefined in v0.6.0-188. If I run

npm install https://github.com/GerHobbelt/jison/archive/0.6.0-188.tar.gz
cat <<'EOF' > test.jison
%lex
%%
. return 'CHARACTER';
/lex

%start characters

%%

characters
  : character
  | characters character
  ;

character: 'CHARACTER' { console.log(@$); };
EOF
node_modules/jison-gho/lib/cli.js test.jison
node --eval "require('$(pwd)/test.js').parse('abc')"

on macOS 10.12.6, the output is

undefined
undefined
undefined
GerHobbelt commented 7 years ago

@$ is the yylloc info result of the rule, similar to $$ being the value result.

Previously (0.4.18-xxx) you got away with this in most circumstances because 0.4.18 run-time always performed the old 'default actions':

$$ = $1;
@$ = @1;

before executing any rule action code chunk, so that $$ and @$ acted as identical copies of $1 and @1. v0.6.0-XXX doesn't do this any more (for run-time performance reasons) and only injects default actions when there's no user code at all (which is more like the bison behaviour, incidentally).

Anyway, long story short:

The correct way to reference any rule production term, is by numeric or named reference, e.g.

character: 'CHARACTER' { console.log(@1); }
               ;

but I never use the quoted token names as that is only extra overhead cost, hence your example would be written as (showcasing 'named reference' for CHARACTER, by the way):

%lex
%%
. return 'CHARACTER';
/lex

%start characters

// %token CHARACTER  <-- is *implicit* due to the way I wrote the character rule below

%%

characters
  : character
  | characters character
  ;

character: CHARACTER 
        { console.log(@CHARACTER); }
    ;

The next snippet is a self-contained executable, if you compile it with jison --main (see jison --help):

%lex
%%
. return 'CHARACTER';
/lex

%start characters

// %token CHARACTER  <-- is *implicit* due to the way I wrote the character rule below

%%

characters
  : character
  | characters character
  ;

character: CHARACTER 
        { console.log(@CHARACTER); }   // or use `@1` to ref the 1st term in the rule production
    ;

%%

// compile with `jison --main test.jison`
// then execute this `main()` with `node test.js`

parser.main = function () {
    parser.parse('abc');
};
GerHobbelt commented 7 years ago

BTW:

@$ is useful when you construct your own location reference for the rule, e.g.

rule: a b c d e f 
    { @$ = yylexer.mergeLocationInfo(##a, ##f); $$ = $a + $b + $c + $d + $e + $f; }

## is a jison-specific advanced use prefix: it transforms the production term reference name/number into the index number for the parse stack(s) yyvstack (value stack, i.e. the array carrying all the $a/$b/$c/... values), yylstack (location infos @a,@b,@c,...), yysstack (parse state stack, in case you want to use these internals), etc.

The supported references in parser action code:

nwhetsell commented 7 years ago

I actually am trying to access the location of the rule result, the “left hand side grouping” described in http://www.gnu.org/software/bison/manual/html_node/Actions-and-Locations.html#Actions-and-Locations. From that page:

…there is a default action for locations that is run each time a rule is matched. It sets the beginning of @$ to the beginning of the first symbol, and the end of @$ to the end of the last symbol.

I’m not sure that @$ was the same as @1 in Jison v0.4.18-186 and earlier. For example, here is a test of a Jison grammar created with Jison v0.4.18-184:

https://github.com/nwhetsell/linter-csound/blob/0494a0650bc93cbdd8a656239dd4376d42844754/lib/csound-parser/spec/orchestra-parser-spec.js#L453-L508

The part of the test of the object created from new parser.Instrument (https://github.com/nwhetsell/linter-csound/blob/0494a0650bc93cbdd8a656239dd4376d42844754/lib/csound-parser/spec/orchestra-parser-spec.js#L482-L505) checks how this part of the Jison grammar—

https://github.com/nwhetsell/linter-csound/blob/0494a0650bc93cbdd8a656239dd4376d42844754/lib/csound-parser/orchestra.jison#L511-L520

—parses this text (from https://github.com/nwhetsell/linter-csound/blob/0494a0650bc93cbdd8a656239dd4376d42844754/lib/csound-parser/spec/orchestra-parser-spec.js#L457-L459)

instr A440
  outc oscili(0.5 * 0dbfs, 440, giFunctionTableID)
endin

The first argument of the parser.Instrument constructor is @$. The test shows that the value of @$ in Jison v0.4.18-184 covered several lines starting at the instr token and ending at the endin token. If @$ was the same as @1, it would’ve been on one line, and only covered the instr token.

Is there any way to restore the default action for locations — that is, to bring back the old meaning of @$ — in Jison v0.6.0? Or do I need to pass the first and last token locations to my AST node constructors to get location information?

GerHobbelt commented 7 years ago

Nothing is set in stone. :wink:

Given your need (and mine, which is a bit different), I am thinking about what would be the best way to tackle this. The main reason to simplify/remove the default actions (particularly when you also have user actions for the given rule production) is speed: most grammars I know and use don't employ this behaviour but 'most' is certainly not 'everyone'.

The/my goal (fast parsers and lexers) is tackled through 'code analysis' and resulting selective code strippin/rewriting, which is now done using straight text manipulation via regexes and .replace() calls, remains as-is, this would imply that I need to have some sort of user-configurable 'switch' where I can simply select preferred behaviour in a few aspects. Maybe something like

%options default-action='<setting>'

where setting can be one of the following: "none", "bison", "ast". Or slightly more specific, as there's default value actions and default location actions:

%options default-action='<valuesetting>,<locationsetting>'
// valuesetting: none, bison, ast --> $$=undefined, $$=$1, $$=[$1....$n]
// locationsetting: none, bison  --> @$=undefined, @$=merge(@1..@n)

Anyway, that's just pondering how this might be made to look at the jison usage level. Making it happen is another story: I'm looking into migrating the code analysis and rewriting/stripping to a recast/jscodeshift-based process, but that is non-trivial, so will take some time.

The quick fix for you would be to recover (most of) the old behaviour but that would negate all my performance gains through simplified code as most of my grammars don't use these default actions, thus they are surplus=overhead for me. Let me think about it for a bit, maybe I come up with a decent 'quick fix' overnight.

nwhetsell commented 7 years ago

Is there an option I can use to get this code to run? That looks like it should restore the default action.

GerHobbelt commented 7 years ago

Not yet, except of course writing that code every explicit action, which is a bother and doesn't help produce readable grammars.

The mistake I made is bundling $$ and @$ treatment together (see for related material: https://github.com/zaach/jison/issues/343): right now in build 188/189, when you have an explicit action for a rule, it is detected and ALL default actions are not executed, while the only breaking change compared to https://github.com/zaach/jison/issues/343 which should have been done is kill the $$=$1 default action.

Anyway, I'm working on this right now to provide a better (and when the user wants, backwards compatible) behaviour re default action injection in the generated grammar.

GerHobbelt commented 7 years ago

release 0.6.0-191 has reverted to default 'classic' default behaviour ($$ = $1; @$ = merged;) and supports other default action modes (see jison --help for more on the new --default-action option).

This should fix this issue.


Notes (about $$ and @$ use in jison grammar actions since release 0.6.0-191)

Also note that jison now has better/tighter user-written action code analysis to help ensure a proper rule value result $$ and location result @$ are produced:

  1. when the user action code is absent OR does not set any of these ($$ and @$), then default action code will be injected for every action block = every rule production which doesn't have these assignments.

    In other words: if your code doesn't provide a $$ and/or @$ value, then jison will. (Unless your default mode has been configured as skip for the one(s) missing.)

  2. If the action code analyzer determines that you happen to access (read) the $$ and/or @$ values before assigning a value to them in the same action block, then the relevant default action will be injected before your action code, again depending on selected --default-action mode.

    In other words: if your code doesn't provide a $$ and/or @$ value to work with, then jison will. (Unless your default mode has been configured as skip for the one(s) missing.)

    Note that from now on $$ is fundamentally different from $0 and happens to be more in line with the way yacc/bison treat(ed) this.

    Also do note that read-accessing $$ and @$ before writing (assigning a value) to it in an action code block is very peculiar coding behaviour and certainly non-portable, not even in the yacc/bison world outside jison! (The same goes for $0 references, which are discouraged action code behaviour even though it is documented in a corner of the bison documentation. 😉 )

  3. If the action code analyzer determines that you happen to write-access (assign a value) to $$ and/or @$ values before you happen to read-access the same variable in the same action code block (which makes sense if your action code is performing non-trivial operations), jison WILL NOT inject the default action for that action as it would be unused anyway. Hence the analyzer attempts to produce the best performant parser code.

    Note that the analyzer is currently still a rather 'dumb' analyzer in that it analyzes your action code using regexes rather than a full JavaScript parse (and AST scan), hence "wicked" action code like this will confuse the analyzer into not injecting default actions before the user-written action code:

    function nasty(z) {
    $$ = z;  // this is analyzed as 'assignment before use' despite actual code flow!
    }
    var v = $$; // actual code flow: read before write!
    nasty(v + $1);
GerHobbelt commented 7 years ago

@nwhetsell: ping?

Jison should be backwards compatible again since build 0.6.0-191; see also the description of treatment of $$ (and @$) in the comment above in this issue.

nwhetsell commented 7 years ago

Yep, this appears to be fixed, thank you!