Closed michaelblyons closed 4 years ago
That's a good find. SBNF shouldn't ever get into an infinite loop, but I think this is one of those edge cases that just had to be found through experimentation. The reason it is looping in this case is due to branch points: The symbol
must have a branch point as it is ambiguous where symbol-char-must-have
is matched, but since the branch point is in a loop SBNF also gets into a loop when trying to exhaustively produce contexts for the branch point.
I'll work on a fix for this, but here's a couple things that may help your understanding of SBNF:
SBNF translates into a set of sublime-syntax contexts. 'a' 'b'
translates into two contexts, one matching the regex 'a'
and sets the context that matches the regex 'b'
. Like most other sublime syntaxes, any amount of whitespace is allowed between the contexts. So 'a' 'b'
actually matches all of the following:
ab
a b
a
b
As such, SBNF rules can't replace sublime-syntax regexes. If you're trying to match a contiguous symbol you will always have to do so with regex, regardless of whether you're using sbnf or sublime-syntax. Here's how you'd rewrite symbol
to not require a branch point:
symbol
= '(?:\d|\.|[^\s()\'"/,:;|\d.]|(\\.))*[^\s()\'"/,:;|\d.]|(\\.)(?:\d|\.|[^\s()\'"/,:;|\d.]|(\\.))*'
{1: constant.character.escape, 2: constant.character.escape, 3: constant.character.escape}
;
# or
symbol = symbol-impl['[^\s()\'"/,:;|\d.]|(\\.)'] ;
symbol-impl = '(?:\d|\.|#[MUST_HAVE])*#[MUST_HAVE](?:\d|\.|#[MUST_HAVE])*'
{ 1: constant.character.escape
, 2: constant.character.escape
, 3: constant.character.escape
}
;
So
'a' 'b'
actually matches all of the following:
Thanks! Mental model adjusted. (I thought I would be creating a whitespace matcher and using that in between terminals.)
symbol = symbol-impl['[^\s()\'"/,:;|\d.]|(\\.)'] ; symbol-impl = '(?:\d|\.|#[MUST_HAVE])*#[MUST_HAVE](?:\d|\.|#[MUST_HAVE])*' { 1: constant.character.escape , 2: constant.character.escape , 3: constant.character.escape } ;
It took me a while to realize that #[MUST_HAVE]
is coming from this rule in arguments
An identifier that does not reference a rule is a free variable unique to the rule's scope. It matches any argument and may be passed on and or interpolated.
Does that imply that I can create and reuse variables like sublime-syntax has by doing this very contrived example:
octet = '(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:[1-9][0-9])|[0-9])' ;
ipv4-impl = '(?:(?:#[OCTET]\.){3}#[OCTET])'{constant.numeric.ip-address.v4} ;
ipv4 = ipv4-impl[octet] ;
rgb-impl = 'rgb(\()(#[LEVEL])(,)\s*(#[LEVEL])(,)\s*(#[LEVEL])(\))'
{ meta.foo
, 1: punctuation.section.begin
, 2: constant.numeric.integer.decimal
, 3: punctuation.separator.sequence
, 4: constant.numeric.integer.decimal
, 5: punctuation.separator.sequence
, 6: constant.numeric.integer.decimal
, 7: punctuation.section.end
} ;
rgb = rgb-impl[octet] ;
Ah, sorry. I made a mistake in constructing my suggestion. It's supposed to be:
symbol
= '(?:\d|\.|[^\s()\'"/,:;|\d.]|(\\.))*[^\s()\'"/,:;|\d.]|(\\.)(?:\d|\.|[^\s()\'"/,:;|\d.]|(\\.))*'
{1: constant.character.escape, 2: constant.character.escape, 3: constant.character.escape}
;
# or
symbol = symbol-impl['[^\s()\'"/,:;|\d.]|(\\.)'] ;
symbol-impl[MUST_HAVE]
= '(?:\d|\.|#[MUST_HAVE])*#[MUST_HAVE](?:\d|\.|#[MUST_HAVE])*'
{ 1: constant.character.escape
, 2: constant.character.escape
, 3: constant.character.escape
}
;
The MUST_HAVE
variable is required to also be in the argument list of the rule.
Does that imply that I can create and reuse variables like sublime-syntax has by doing this very contrived example:
Kind of. The problem with your example is that it's trying to interpolate a rule into a regex. There's no current way to declare a global variable, as such the only way to pass a regex/string to a rule is directly:
ipv4-impl[OCTET] = '(?:(?:#[OCTET]\.){3}#[OCTET])'{constant.numeric.ip-address.v4} ;
ipv4 = ipv4-impl['(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:[1-9][0-9])|[0-9])'] ;
rgb-impl[LEVEL]
= 'rgb(\()(#[LEVEL])(,)\s*(#[LEVEL])(,)\s*(#[LEVEL])(\))'
{ meta.foo
, 1: punctuation.section.begin
, 2: constant.numeric.integer.decimal
, 3: punctuation.separator.sequence
, 4: constant.numeric.integer.decimal
, 5: punctuation.separator.sequence
, 6: constant.numeric.integer.decimal
, 7: punctuation.section.end
} ;
rgb = rgb-impl['(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:[1-9][0-9])|[0-9])'] ;
Global variables would be a good addition imo, though a separate issue :)
I started working on a Lisp parser. I have middling experience with sublime-syntax, but none at all with BNF. Lisp lets you use almost any character in a symbol so long as it is not only digits and
.
s. (It gets worse than that with delimiters, but I didn't get that far.) I tried to represent that in the example below, but I blow the stack.I don't think there are any loops in the hierarchy, but I am prepared to be wrong.