Open matthew-dean opened 1 year ago
I had a look through the code, and AFAICT, IGNORE_AMBIGUITIES
is never checked before building an ambiguity error? I think this is subverting Chevrotain's default behavior, which is that even if two alternatives match entirely (or their lookaheads do), the first one can be chosen and the warning is suppressed.
You're right, we currently don't check the flag. Note that an ambiguity in this package hints at a very serious grammar flaw and should not be disregarded. Compared to the LL(k) lookahead, an ambiguity in ALL(*) leads to very serious performance degradation, as the algorithm doesn't stop until it is absolutely sure it's ambiguous. Likely what you're experiencing in https://github.com/langium/chevrotain-allstar/issues/5.
You can override the print behavior anyway with the constructor options of the strategy.
@msujew You're right. I get degraded performance when ambiguities exist. I assume that gates are applied before checking for ambiguities?
I assume that gates are applied before checking for ambiguities?
Yes, similar to the default LL(k) behavior in Chevrotain, gates are checked beforehand. In ANTLR4, gates are checked on demand, but that also leads to performance issues in ambiguous grammars - just different ones. While chevrotain-allstar struggles if the gates are expensive, ANTLR4's strategy struggles in case of long prefixes (that could be resolved by performing the gate check beforehand).
An adaptive gate evaluation mode (i.e. try lazy until encountering a long prefix - then switch to eager evaluation) would be quite interesting to investigate. Unfortunately nothing I have time for in the foreseeable future :(
@msujew Is there a way to tell chevrotain-allstar
to ignore a certain path when testing for ambiguity?
Note that an ambiguity in this package hints at a very serious grammar flaw and should not be disregarded.
I don't necessarily disagree, but CSS's grammar is defined with ambiguities, and CSS is not a niche grammar.
To explain: in CSS, there are a number of places in the definition where it specifies a specific token pattern, but it has a "fallback" definition. For instance, for a media query, it allows a parenthetical like (min-width: 640px)
, or a "range" like (width > 640px)
OR what it calls "general enclosed" which is ([any sequence of tokens])
. Obviously, any sequence is inherently ambiguous and can match the other two patterns. CSS is intentionally ambiguous in lots of places, because the intention is that it's up to a user agent to interpret what that sequence of tokens may mean, if anything. So, there are increasingly places in CSS grammar where it basically says to "try and restart". I know Chevrotain by default doesn't handle this very well, but I'm wondering if this package may help.
So, in my case, I want it to determine if it's a declaration or a range, but only if it fully matches. Otherwise, it should just parse as a general sequence of tokens. Can you think about the best way to approach this?
I tried something like the following, because I naively thought that perhaps if I only passed on alternation into the lookahead function at a time, that would be the only one "tested".
orInternal<T>(
altsOrOpts: Array<IOrAlt<any>> | OrMethodOpts<unknown>,
occurrence: number
): T {
let alts: Array<IOrAlt<any>>
let laKey = this.getKeyForAutomaticLookahead(OR_IDX, occurrence)
let optionalAlt = false
if (isArray(altsOrOpts)) {
alts = altsOrOpts
} else {
alts = altsOrOpts.DEF
if (altsOrOpts.IGNORE_AMBIGUITIES) {
optionalAlt = true
}
}
let laFunc = this.getLaFuncFromCache(laKey)
let altIdxToTake: any
if (optionalAlt) {
let altLength = alts.length
for (let i = 0; i < altLength; i++) {
let alt = alts[i]!
/** Try a single alt at a time */
let pass = laFunc.call(this, [alt])
if (pass !== undefined) {
altIdxToTake = i
break
}
}
} else {
altIdxToTake = laFunc.call(this, alts)
}
if (altIdxToTake !== undefined) {
const chosenAlternative: any = alts[altIdxToTake]
return chosenAlternative.ALT.call(this)
}
this.raiseNoAltException(
occurrence,
(altsOrOpts as OrMethodOpts<unknown>).ERR_MSG
)
}
Unfortunately, I discovered that the lookahead functions are pre-cached for each alternation set, so this didn't work. 🤔
Correct me if I'm wrong, but I would think that this algorithm would be tolerant with ambiguities in OR expressions. To be fair, this lookahead strategy eliminates most ambiguity warnings, but sometimes, some persist, and
IGNORE_AMBIGUITIES
has no effect. Is there a workaround?