Closed ChrisHixon closed 2 years ago
@mingodad
I believe the issue you discovered here: https://github.com/ChrisHixon/chpeg/issues/22#issuecomment-1169821880 is a case of this issue (https://github.com/ChrisHixon/chpeg/issues/28)
TRIM now has the auto-leaf feature, so {L}
is automatically added to any rule with TRIM.
You have one rule with TRIM, NAME, then another rule with TRIM getting called via SKIP <- COMMENT <- COMMENT_LONG <- STRING_LONG from SKIP at the end of NAME. This second TRIM is overriding because there can only be one TRIM on a node; the second TRIM isn't getting its own node to work on since tree building has been locked(disabled), so you're getting the contents of the STRING_LONG TRIM in NAME.
NAME <- !KEYWORD <NAME_PREFIX NAME_SUFFIX?> SKIP
SKIP <- (SPACE+ / COMMENT)*
COMMENT <- '--' (COMMENT_LONG / COMMENT_SHRT)
COMMENT_LONG <- STRING_LONG
STRING_LONG <-
'[' $lsep<'='*> '[' LINEBREAK? <( ! (']' $lsep ']') . )*> ']' $lsep ']'
Hopefully that made sense. I can't think of a workaround at the moment, other than removing one of the TRIMs.
So there really is a class of issues related to tree building being inhibited.
That makes sense and it can happen on any non trivial grammar .
But removing one TRIM
is not an option for the whole grammar, in this case Lua
grammar shown here https://github.com/yhirose/cpp-peglib/issues/229#issuecomment-1169847228.
Meanwhile I think that chpeg
should emit an error message in cases like this one, something like:
input:xx:yy: Nested trim found.
There would need to be a lot of warnings including the other cases I mention in this issue. I'd rather devote time to fixing the issue. This is an issue only with extensions that expect certain nodes to be present... TRIM and REFS (back-references). So tree 'locking', which made sense without these extensions, has to be revisited; probably a new approach is needed. Another workaround is not using TRIM or nested backreferences (global scope should be okay). It's possibe to capture what you want without TRIM but it will require changing rules, new rules, and probably more AST Options ({[SIL]}
), etc.
But even then I don't think that is nice to silently accept a grammar and output broken AST, if there is lot's of warnings/errors the user (we) should be informed.
Also see comments here https://github.com/yhirose/cpp-peglib/issues/229#issuecomment-1170013965
A bit related see also this issue https://github.com/yhirose/cpp-peglib/issues/230#issue-1289025524
I think this issue is fixed for TRIM with this commit: https://github.com/ChrisHixon/chpeg/commit/06d10be934783b8c4af2479ba484fdbaab52d7dc
I'm still testing this change. It could use more testing if you have a chance. The playground is updated with this fix: https://chrishixon.github.io/chpeg/playground/
Some grammars that somehow was working before isn't now, like the nelua
(https://github.com/ChrisHixon/chpeg/issues/20#issuecomment-1169645706) with this input https://github.com/edubart/nelua-lang/blob/master/lib/hashmap.nelua:
Output:
Parse failed with result: 1
input:23:26: Expected: ")" in funcbody
Input line with error:
-- Ceil integer division.
local function ceilidiv(x: usize, y: usize): usize <inline> ////!!!!<<< here
return (x + y - 1) // y
end
I did blow away the nested TRIM handling with this fix, for example:
Grammar:
A <- < 'foo' <'a'> / 'bar' <'b'> >
Input:
barb
This currently captures b
, but according to previous behavior the outer < ... >
should take precedence and capture barb
I think the previous behavior is more correct than just letting whatever runs last overwrite the trim values.
What if when inside a capture then ignore any other new capture creation ?
Probably my last suggestion isn't good because when you cache a rule you need to invalidate it just in case it has any nested TRIMs.
What if when inside a capture then ignore any other new capture creation ?
That's essentially the effect of what happened previously, though inner levels are not ignored, it pushed each level onto the stack. I'd better restore that behavior, I think there may be other issues otherwise.
Packrat should be OK, nodes cached will end up with the trimmed values. But this is another thing to test.
I haven't investigated your error above, let's see if maybe it'll be fixed after I restore the previous nested trim behavior.
Here's the nested TRIM fix: be032f2df8dbc8d33e6a179dbd96dfc65c415ebd I was unable to recreate the nelua problem, that parsed fine for me before and after this fix.
Yes the nelua is working now for me too, but a Lua grammar (https://github.com/ChrisHixon/chpeg/issues/20#issuecomment-1171051281) that was working stopped now (I'm investigating right now). With this input https://github.com/edubart/lpegrex/blob/main/lpegrex.lua . Output:
Parse failed with result: 3
input:389:50: Expected: "\t" in NAME
Offending line:
Suffix = l.Cf(l.V"Primary" *
( S * ( l.P"+" * l.Cc(1, lmt.__pow)
+ l.P"*" * l.Cc(0, lmt.__pow)
+ l.P"?" * l.Cc(-1, lmt.__pow)
+ l.P"~?" * l.Cc(l.Cc(false), lmt.__add)
+ "^" * expect( l.Cg(Num * l.Cc(mult)) ////!!!!!<<<<< Offending line
+ l.Cg(l.C(l.S"+-" * l.R"09"^1) * l.Cc(lmt.__pow)
+ Name * l.Cc"lab"
),
"ExpNumName")
+ "->" * expect(S * ( l.Cg((String + Num) * l.Cc(lmt.__div))
+ l.P"{}" * l.Cc(nil, l.Ct)
+ defwithfunc(lmt.__div)
),
"ExpCap")
+ "=>" * expect(S * defwithfunc(l.Cmt),
"ExpName1")
+ "~>" * S * defwithfunc(l.Cf)
) --* S
)^0, function(a,b,f) if f == "lab" then return a + l.T(b) end return f(a,b) end );
And using only the the subset with error as input we get:
Suffix = l.Cf(l.V"Primary" *
( S * ( l.P"+" * l.Cc(1, lmt.__pow)
+ l.P"*" * l.Cc(0, lmt.__pow)
+ l.P"?" * l.Cc(-1, lmt.__pow)
+ l.P"~?" * l.Cc(l.Cc(false), lmt.__add)
+ "^" * expect( l.Cg(Num * l.Cc(mult))
+ l.Cg(l.C(l.S"+-" * l.R"09"^1) * l.Cc(lmt.__pow)
+ Name * l.Cc"lab"
),
"ExpNumName")
+ "->" * expect(S * ( l.Cg((String + Num) * l.Cc(lmt.__div))
+ l.P"{}" * l.Cc(nil, l.Ct)
+ defwithfunc(lmt.__div)
),
"ExpCap")
+ "=>" * expect(S * defwithfunc(l.Cmt),
"ExpName1")
+ "~>" * S * defwithfunc(l.Cf)
) --* S
)^0, function(a,b,f) if f == "lab" then return a + l.T(b) end return f(a,b) end );
Error:
Parse failed with result: 3
input:11:47: Expected: HEX_NUMBER in Number
The same Lua grammar replacing {I}
by ~
in front of the rule works fine on cpp-peglib
.
I can verify that works on master but fails on these latest TRIM fixes. The nature of the errors may indicate something strange happening in the VM. I'll check it out it more detail.
Note the result code: Parse failed with result: 3
This is a stack overflow. Pushing 6 items per IDENT call (vs. 4 previously) blows the stack with this grammar/input.
I raised the default stack size to 2048 and it parses fine. The chpeg-trace -t1 is identical with master, commit before TRIM changes, and after.
Yes it's working now !
Thank you so much !
Maybe it's a good idea to extend the error message with a description of the error, otherwise we end up in auto mode and ignore anything result: \d+
because it's not relevant most of the time.
Something like:
Parse failed with result: 3 -> Stack overflow
Yes it's working now ! Thank you so much ! Maybe it's a good idea to extend the error message with a description of the error, otherwise we end up in auto mode and ignore anything
result: \d+
because it's not relevant most of the time. Something like:Parse failed with result: 3 -> Stack overflow
Yes that's a good idea. Also, printing the tracked parse errors makes no sense unless it's a normal parse failure ( CHPEG_ERR_PARSE_FAILED = 1)... they will be nonsensical like what we saw here.
That's true and would call our attention when a significant error code/message appear instead of ignore by default.
So I think that takes care of part 1 of this issue. I'm thinking of doing a similar thing with REFS to remove the dependence on parse tree nodes. I think it'll actually be more like it was initially implemented but with the undo system that solves other issues. After that's done I think it would be possible to have an option to parse without building a tree.
That's nice for syntax check only on complex grammars during the development/debug/tuning.
Have you used any PEG implementation that can do context sensitive things like back-references, or better yet, something like symbol tables, without creating a parse tree? Actually which ones have symbol tables period?
I'm running into snags with my parse-tree-free reference implementation attempt and I think I'll have to abandon it and find a different way to solve the topic issue. Issues are similar to those I've encountered previously, backtracking being the enemy. Using parse tree nodes with undo actions is easier, more elegant, packrat compatible, and I think it'll work well for future context sensitive things, treating the parse tree like a DOM of sorts. Every other approach I've attempted seems to lead to a big mess, is inefficient, or not packrat compatible, etc.
The only one I've tried was nez
but it doesn't work as advertised it's abandoned but it's ideas are interesting.
The other one that I'm actively using is peg/leg
from Piumarta in this fork https://github.com/mingodad/peg it's reasonable easy to work with it and performance is also good, but for grammar development cpp-peglib
and chpeg
are the kings right now.
I believe this commit fixes this issue: https://github.com/ChrisHixon/chpeg/commit/04d2721bb428ad0e4cb2b1d61dfa8037399b61bc
You can now place an {I}
on the start rule to get no parse tree or {L}
to get one node spanning the entire parse.
The previously 'locked' tree under IGNORE/LEAF is now 'restricted' to only allow nodes through that absolutely require a parse tree node for functionality (currently, only REFSCOPE). Child nodes under {I}/{L}
are deleted after the fact.
Just after reloading the playground and trying the grammar shown bellow (calc) with adding {i}
to Expr
seems to freeze the browser:
Expr {I} <- _ Sum EOF
Sum <- Product (PLUS_MINUS Product)*
Product <- Value (MUL_DIV Value)*
Value <- NUMBER / ParenExpr
ParenExpr <- LPAREN Sum RPAREN
PLUS_MINUS <- < [+-] > _
MUL_DIV <- < [*/] > _
NUMBER <- < '-'? ("0" / [1-9][0-9]*) > _
LPAREN {I} <- < '(' > _
RPAREN {I} <- < ')' > _
_ {I} <- [ \t\r\n]*
EOF <- !.
Input:
1 +
(2 * (3 + 44 / 5) - 6)
* 7 + 8 - 9
It seems to work fine for me in Chromium and Firefox using that grammar/input. Maybe it didn't update for you yet - try reloading page. If still fails, can you give more info, browser version, console errors, etc.
Loading it locally after git pull
I'm getting a frozen browser (chromium) on Linux64:
chpeg-playground.js:1 Uncaught RuntimeError: Aborted(Assertion failed: self->tree_root->num_children == 1, at: ./../include/chpeg_ext.h,3897,ChpegParser_parse). Build with -s ASSERTIONS=1 for more info.
at abort (chpeg-playground.js:1:9187)
at ___assert_fail (chpeg-playground.js:1:12748)
at chpeg-playground.wasm:0x89d2
at chpeg-playground.wasm:0xa257
at Module._parse (chpeg-playground.js:1:14476)
at ccall (chpeg-playground.js:1:4683)
at chpeg-playground.js:1:5073
at index.js:123:5
But after rebuild the playground it works fine, so have you rebuilt the playground ?
I did and have been using it from https://chrishixon.github.io/chpeg/playground/ I just attempted again and nothing changed so there's nothing to commit/push. It must be GitHub's infrastructure/CDN is not updated for you.
Yes, probably you are right because I did a refresh again and now it's working as you mention. So need to take this in account for fresh updates on github pages.
But when I reloaded again the same console error came back, but with my fresh local build it works fine. So I'll try again latter to see if it succeed.
It was my fault I just was testing on my old playground here https://meimporta.eu/chpeg/ because I have several tabs open and didn't noticed I'm on my outdated playground. Sorry again for the noise ! Thanks again for your great work and help !
No problem, thanks for your help testing, and finding/reporting bugs! Closing this issue.
Adding
{L}
to a rule has the effect of inhibiting parse tree building under the node created for that rule. However, some things are going to have a problem with that, and probably fail in strange/undefined ways. One such case is nested back-references, which require reference scope nodes:Grammar:
Input:
Result:
Without the
{L}
onROOT
, this parses ok.This is likely not just a problem with
{L}
nodes,{I}
also inhibits tree building.What else might have issues related to this ... things that require or assume a node is present?
TRIM, for one. Trim operates on the top node on the stack, assuming that node is the 'current node' (its rule's node.)