cdiggins / myna-parser

Myna Parsing Library
https://cdiggins.github.io/myna-parser
MIT License
78 stars 15 forks source link

delay() rule ID number changes each time #7

Closed elidoran closed 7 years ago

elidoran commented 7 years ago

I'm setting up an AST evaluator and instead of matching rules by their name I am using their id in the hopes it'll be faster to do number comparison than string comparison.

It's working except for the one delayed rule created via myna.delay(function() { return myna.choice(...) }).

When I run myna.parse(rule, input) the ID of the delayed node is brand new. The first one after the last of my rule's IDs.

And, if I run the parse() repeatedly then the ID changes each time. It's growing by 10. So, I'm wondering what's going on in parse() which creates 10 new rules. Perhaps that one rule is being used 10 times?

I'm thinking the original ID should remain the same. Maybe not for Delay.

Ah-hah. I looked in the JS source and found the M.choice() I'm using in the delayed rule creates a new Choice rule. So, it's doing this every time the delayed rule is used during a parse().

Do you think it's possible to reuse the ID from the "delayed rule" in the rule created when the deferred execution is resolved? In what I'm doing, I mean the myna.choice() would create the new Choice rule and use the ID from its wrapping Delay rule.

Wait, does it call that every time the rule is encountered or does it resolve the first time?

It looks like the delay is created with a function which is passed to the Delay constructor and then reused in parseImplementation() like this.fn().parse(p). So, it seems like it re-resolves it every time.

So, what do you think about making it capable of resolving only once and reusing the id assigned to the Delay rule which wrapped it?

cdiggins commented 7 years ago

I'm so sorry for the delay, somehow my Git notifications are not getting classified correctly. I need to change my email settings. I will need to get back to you on this in the evening.

PS: I am doing some heavy optimization in another branch, so far seeing very dramatic speed ups.

elidoran commented 7 years ago

It's alright, I understand, we're all busy. I have plenty to do until you have time to get back to me on this. I'm mostly playing around with this stuff.

I'm glad to hear there are improvements on the way because:

  1. enhancements always seem to be an adventure
  2. I've been running a JSON parsing comparison of Myna, Chevrotain, Parsimmon, and a state machine processor I'm working on and Myna has been performing on the low end. I was waiting to hear about this Delay thing before mentioning it, but, you brought it up, so I thought I'd throw it out there.

Have a good day.

cdiggins commented 7 years ago

@elidoran I removed IDs in the latest version of Myna. In order to achieve what you wanted originally, would require that you augment your parse tree after creation with IDs created from the names. I hope this doesn't make your work too much harder.

elidoran commented 7 years ago

Well, that's one way to go :)

Adapting it was easy. Instead of using the ID's it made I simply assigned ID's to them.

The Delay was still a problem tho. It is called more than once instead of resolving the first time.

So, I cached the choice() it is creating to return in the subsequent call.

Also, the first AST node matching the delayed rule does not have the ID set in its rule. So, I have to match that via its name the first time. After that, it has the ID on it.

Relevant part:

var choice
thing = M.delay(function() {
  console.log('delay()')
  return choice || (choice = M.choice(...))
})
other = M.blah()

thing.id = 1
other.id = 2

// in AST tree evaluator
switch (ast.rule.id) {
  case 1:
    // thing: ast.rule.id is 1 the second and later times
    console.log('later match ast.rule.id', ast.rule.id)
    break;
  case 2: /* other */ break;
  default:
    if (ast.rule.name === 'thing') {
      // ast.rule.id is undefined the first time only.
      // the next time it'll have the 1.
      console.log('first match ast.rule.id is', ast.rule.id)
    }
}

What I see for output is:

delay()
delay()
first match ast.rule.id is undefined
later match ast.rule.id 1

So, this is better. It only does delay() twice now and the ID works from the second time on.

Also, the newer version is faster. Great job.