`lexer.conditions[K].rules` grows with each call to parse

ericprud commented 5 years ago

The lexer code in setInput switches the .rules to 1-indexed:

for (var k in conditions) {
  var spec = conditions[k];
  var rule_ids = spec.rules;
  var len = rule_ids.length;
  var rule_regexes = new Array(len + 1); // slot 0 is unused; we use a 1-based index approach here to keep the hottest code in `lexer_next()` fast and simple! 
  var rule_new_ids = new Array(len + 1);

  for (var i = 0; i < len; i++) {
    var idx = rule_ids[i];
    var rule_re = rules[idx];
    rule_regexes[i + 1] = rule_re;
    rule_new_ids[i + 1] = idx;
  }

  spec.rules = rule_new_ids;
  spec.__rule_regexes = rule_regexes;
  spec.__rule_count = len;
}

If you generate a parser and use it a second time, .rules ends up with an undefined at index [1]. For some reason "foo".match(undefined) matches so you end up an in endless look where undefined matches but doesn't advance the cursor.

I found a terrible and a maybe-not-so-bad work-around.

terrible - ignore leading undefines

      for (var i = 1; i <= len; i++) {
        tempMatch = regexes[i] ? this._input.match(regexes[i]) : null;

pros: one line to change cons: it grows by one undefined on each iteration

maybe-no-so-bad - `s/len \+ 1/len/` and unshift into the rules

        for (var k in conditions) {
          var spec = conditions[k];
          var rule_ids = spec.rules;
          var len = rule_ids.length;
          var rule_regexes = new Array(len);             // slot 0 is unused; we use a 1-based index approach here to keep the hottest code in `lexer_next()` fast and simple! 
          var rule_new_ids = new Array(len);

          for (var i = 0; i < len; i++) {
            var idx = rule_ids[i];
            var rule_re = rules[idx];
            rule_regexes[i] = rule_re;
            rule_new_ids[i] = idx;
          }

          spec.rules = rule_new_ids;
          spec.__rule_regexes = rule_regexes;
          spec.__rule_count = len;
        }

and in the constructor, prefix each rules block with an undefined:

Object.keys(lexer.conditions).forEach(c => lexer.conditions[c].rules.unshift(null))

I did this just after

    conditions: {
      'INITIAL': {
        rules: [...]

and was able to use the parser a zillion times (well, north of 2000, at least).

GerHobbelt commented 5 years ago

Good catch. Hm. Better to fix this in the code generator itself so that my (already hacky ;-) ) shift-by-1 in there is not needed any more.

ericprud commented 5 years ago

What, you don't like my terrible solution?!

Yeah, I considered diving into the generator but thought I'd need your guidance in order to make it reasonably productive.

ericprud commented 5 years ago

/me politely pestering

GerHobbelt commented 5 years ago

:-) pestering is ok

Will be busy with other matters this week. I have looked at this and want to do this as part of a new jison release, which will take some time as the road I had taken for further jison development is wrong and I need to get jison back on the road. Long story there, bottom line: this will take several weeks to get done.

The alternative is doing a kind of hotfix patch release earlier, but I was hoping I could do without that if you can wait that long. Otherwise it's gotta be a hotfix patch next week or something...

On Mon, Nov 25, 2019, 10:31 ericprud notifications@github.com wrote:

/me politely pestering

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/GerHobbelt/jison/issues/52?email_source=notifications&email_token=AADCIHRXSIKA4VZB244HSTTQVOLQNA5CNFSM4I54AYX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFBXKBQ#issuecomment-558068998, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADCIHTEJGRMZSQWIFPU6C3QVOLQNANCNFSM4I54AYXQ .

ericprud commented 4 years ago

here's another friendly pester in case the holidays bring a little time to pick through backlogs...

if you want someone to pair with, i won't be a ton of help at first but i'm keenly interested in the product.

ericprud commented 4 years ago

another pester to remind you that you have a fan base excited to use your parser

GerHobbelt commented 4 years ago

This should be fixed in the upcoming 0.7.0 release, as the lexer kernel code has been fixed by checking if the lexer decompression work has already been done (the this.__decompressed flag...):

    setInput: function lexer_setInput(input, yy) {
        this.yy = yy || this.yy || {};

        // also check if we've fully initialized the lexer instance,
        // including expansion work to be done to go from a loaded
        // lexer to a usable lexer:
        if (!this.__decompressed) {
          // step 1: decompress the regex list:
            let rules = this.rules;
            for (var i = 0, len = rules.length; i < len; i++) {
                var rule_re = rules[i];

            // compression: is the RE an xref to another RE slot in the rules[] table?
                if (typeof rule_re === 'number') {
                    rules[i] = rules[rule_re];
                }
            }

          // step 2: unfold the conditions[] set to make these ready for use:
            let conditions = this.conditions;
            for (let k in conditions) {
                let spec = conditions[k];

                let rule_ids = spec.rules;

                var len = rule_ids.length;
                let rule_regexes = new Array(len + 1);            // slot 0 is unused; we use a 1-based index approach here to keep the hottest code in `lexer_next()` fast and simple!
                let rule_new_ids = new Array(len + 1);

                for (var i = 0; i < len; i++) {
                    let idx = rule_ids[i];
                    var rule_re = rules[idx];
                    rule_regexes[i + 1] = rule_re;
                    rule_new_ids[i + 1] = idx;
                }

                spec.rules = rule_new_ids;
                spec.__rule_regexes = rule_regexes;
                spec.__rule_count = len;
            }

            this.__decompressed = true;
        }

GerHobbelt / jison