Open st235 opened 3 months ago
path-empty
also looks similar to path-abempty
. Do you think it would be better to reorder the rules in the following order:
path-absolute / path-noscheme / path-rootless / path-abempty
and remove path-empty
?
Did you encounter a problem because of this? I think the current version of abnf has enough backtracking to handle this case. Probably we need a test or two to ensure that this isn't a problem.
Did you encounter a problem because of this? I think the current version of abnf has enough backtracking to handle this case.
Hey, thank you for a swift reply!
Actually, nope, don't have any problems so far, but I have not tested for this specific scenario.
It just looked peculiar when I read the ABNF
notation, so I may be utterly wrong here.
Sorry if I confused you!
You're not wrong to make this observation; there are several rules in the various RFCs akin to this one. I should add a test for this rule to ensure that the parser generated for RFC3886 does what one expects.
I saw that you reordered dec-octet
and that is what I had in mind when I was googling for problems with dec-octet
. This is actually how I found your repo in the first place 😅
I am talking about this part of the code
# alternatives reordered for correct matching.
'dec-octet = "25" %x30-35 / "2" %x30-34 DIGIT / "1" 2DIGIT / %x31-39 DIGIT / DIGIT',
(On a side note, really impressive project!)
Another problem within RFC3986
I am feeling is happening is path
rule, and I was wondering if you had any issues with this rule or noticed something similar?
At least it seems that path-abempty
and path-empty
are similar when matching for an empty string. Given both of them match for it, path-abempty
should always win. (not a big problem though that one of the rules is not used).
At the same time, it seems that one can match path-abempty
with literally any string as it may converge to matching an empty substring, and given that the query
and fragment
are not the children of path
rule, I am not 100% sure how it is resolved.
Take this with a grain of salt, as I may be wrong, though would be happy if it turns out to be helpful.
@dekkers I actually figured it out, sorry for the hustle.
It seems that path
rule itself is not used anywhere in the grammar.
Path children expressions are used in heir-part
and relative-part
...
'hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty',
...
'relative-part = "//" authority path-abempty / path-absolute / path-noscheme / path-empty',
Though in these rules path-abempty
is protected by terminal //
symbols that help to select between alterations.
Really confusing definitions! So URI
-rule should be safe and sound.
Though the issues hold for path
-rule.
The snippet below (with empty string) always produces AST with one child: path-abempty
, so it does not reach path-empty
.
src = ''
node, start = rfc3986.Rule('path').parse(src, 0)
print(str(node))
Node(name=path, children=[Node(name=path-abempty, children=[])])
It seems that
path
should also be reordered, right?path-abempty
always matches any string as it accepts 0 iterations.This is a snippet I am referring to: