TypeFox / chevrotain-allstar

Plugin module for the ALL(*) lookahead algorithm in Chevrotain
MIT License
10 stars 0 forks source link

Questions about ambiguity warnings, and working with ambiguous alternations #3

Open matthew-dean opened 1 year ago

matthew-dean commented 1 year ago

Correct me if I'm wrong, but I would think that this algorithm would be tolerant with ambiguities in OR expressions. To be fair, this lookahead strategy eliminates most ambiguity warnings, but sometimes, some persist, and IGNORE_AMBIGUITIES has no effect. Is there a workaround?

matthew-dean commented 1 year ago

I had a look through the code, and AFAICT, IGNORE_AMBIGUITIES is never checked before building an ambiguity error? I think this is subverting Chevrotain's default behavior, which is that even if two alternatives match entirely (or their lookaheads do), the first one can be chosen and the warning is suppressed.

msujew commented 1 year ago

You're right, we currently don't check the flag. Note that an ambiguity in this package hints at a very serious grammar flaw and should not be disregarded. Compared to the LL(k) lookahead, an ambiguity in ALL(*) leads to very serious performance degradation, as the algorithm doesn't stop until it is absolutely sure it's ambiguous. Likely what you're experiencing in https://github.com/langium/chevrotain-allstar/issues/5.

You can override the print behavior anyway with the constructor options of the strategy.

matthew-dean commented 1 year ago

@msujew You're right. I get degraded performance when ambiguities exist. I assume that gates are applied before checking for ambiguities?

msujew commented 1 year ago

I assume that gates are applied before checking for ambiguities?

Yes, similar to the default LL(k) behavior in Chevrotain, gates are checked beforehand. In ANTLR4, gates are checked on demand, but that also leads to performance issues in ambiguous grammars - just different ones. While chevrotain-allstar struggles if the gates are expensive, ANTLR4's strategy struggles in case of long prefixes (that could be resolved by performing the gate check beforehand).

An adaptive gate evaluation mode (i.e. try lazy until encountering a long prefix - then switch to eager evaluation) would be quite interesting to investigate. Unfortunately nothing I have time for in the foreseeable future :(

matthew-dean commented 1 year ago

@msujew Is there a way to tell chevrotain-allstar to ignore a certain path when testing for ambiguity?

Note that an ambiguity in this package hints at a very serious grammar flaw and should not be disregarded.

I don't necessarily disagree, but CSS's grammar is defined with ambiguities, and CSS is not a niche grammar.

To explain: in CSS, there are a number of places in the definition where it specifies a specific token pattern, but it has a "fallback" definition. For instance, for a media query, it allows a parenthetical like (min-width: 640px), or a "range" like (width > 640px) OR what it calls "general enclosed" which is ([any sequence of tokens]). Obviously, any sequence is inherently ambiguous and can match the other two patterns. CSS is intentionally ambiguous in lots of places, because the intention is that it's up to a user agent to interpret what that sequence of tokens may mean, if anything. So, there are increasingly places in CSS grammar where it basically says to "try and restart". I know Chevrotain by default doesn't handle this very well, but I'm wondering if this package may help.

So, in my case, I want it to determine if it's a declaration or a range, but only if it fully matches. Otherwise, it should just parse as a general sequence of tokens. Can you think about the best way to approach this?

matthew-dean commented 1 year ago

I tried something like the following, because I naively thought that perhaps if I only passed on alternation into the lookahead function at a time, that would be the only one "tested".

orInternal<T>(
    altsOrOpts: Array<IOrAlt<any>> | OrMethodOpts<unknown>,
    occurrence: number
  ): T {
    let alts: Array<IOrAlt<any>>
    let laKey = this.getKeyForAutomaticLookahead(OR_IDX, occurrence)
    let optionalAlt = false

    if (isArray(altsOrOpts)) {
      alts = altsOrOpts
    } else {
      alts = altsOrOpts.DEF
      if (altsOrOpts.IGNORE_AMBIGUITIES) {
        optionalAlt = true
      }
    }

    let laFunc = this.getLaFuncFromCache(laKey)
    let altIdxToTake: any
    if (optionalAlt) {
      let altLength = alts.length
      for (let i = 0; i < altLength; i++) {
        let alt = alts[i]!
        /** Try a single alt at a time */
        let pass = laFunc.call(this, [alt])
        if (pass !== undefined) {
          altIdxToTake = i
          break
        }
      }
    } else {
      altIdxToTake = laFunc.call(this, alts)
    }
    if (altIdxToTake !== undefined) {
      const chosenAlternative: any = alts[altIdxToTake]
      return chosenAlternative.ALT.call(this)
    }
    this.raiseNoAltException(
      occurrence,
      (altsOrOpts as OrMethodOpts<unknown>).ERR_MSG
    )
  }

Unfortunately, I discovered that the lookahead functions are pre-cached for each alternation set, so this didn't work. 🤔