Open eugenesvk opened 1 year ago
My suggestion for this either:
number : '([+-])(?x:
([0-9]*)
|(...)
)'{0: sign, 1: whatever, 2: ...} ;
or
SIGN = '([+-])'
SIGN_SCOPE = 'keyword.operator.arithmetic'
decimal : '#[SIGN]...'{0: #[SIGN_SCOPE]} ;
I do find the regex composition suggestion here interesting, but it requires implementing full regex parsing.
Thanks for the tips
Your first suggestion is what I've started with before trying to refactor it into a more readable list pasted in this issue precisely because this way requires repeating all the scope matches for every number type
The second one I'm already using with CLAUSES for scope prefix abbreviations (like #[S_M_INT]decimal
where S_M_INT
(scope,meta,int) is meta.number.integer
, so you get meta.number.integer.decimal
), will extend it to individual scopes S_SIGN
and see how it works
I do find the regex composition suggestion here interesting, but it requires implementing full regex parsing.
By the way, would the (nonexistent) 'whitespace-strict' mode in Sublime syntax parser achieve the same? In that mode rules combo_a_b: rule_a rule_b;
with each rule matching a single char would only match ab
, but not a b
?
By the way, would the (nonexistent) 'whitespace-strict' mode in Sublime syntax parser achieve the same?
Sublime Syntax doesn't care about white space in any way; it's the same as any other characters. The primary difference between simple regexes and match stacks is that the former only works on a single line, whereas the latter works across multiple. Adding a way for SBNF to generate matches that are whitespace sensitive would still result in a meaningfully different grammar than a composed regex.
it's the same as any other characters
ah, ok, than that mode would indeed be SBNF-related, not ST
Adding a way for SBNF to generate matches that are whitespace sensitive would still result in a meaningfully different grammar than a composed regex.
So if our goal is mathcing ab
like so
main:
- match: '(a)(b)'
captures:
1: punctuation.separator.char.a.kdl2
2: punctuation.separator.char.b.kdl2
pop: true
(or in SBNF with regexes, not rules)
a-then-b: '(a)(b)' {1:punctuation.separator.char.a, 2:punctuation.separator.char.b};
main : ~a-then-b;
then the SBNF rules even with a special mode
rule-a : 'a'{punctuation.separator.char.a};
rule-b : 'b'{punctuation.separator.char.b};
a-then-b: (?special-mode-whitespace-sensitive rule-a rule-b);
insead of ↓ matching a b
main:
- match: 'a'
scope: punctuation.separator.char.a.kdl2
push: rule-b|0
pop: true
# Rule: rule-b
rule-b|0:
- match: 'b'
scope: punctuation.separator.char.b.kdl2
pop: true
- match: '\S'
scope: invalid.illegal.kdl2
pop: true
would still not be able to generate ↓ (which seems like the equivalent to regex?, but there could be an easy mistake, so it's still meaningfully different )
contexts:
main:
- match: 'a(?=b)' # only match if next is 'b'
scope: punctuation.separator.char.a
push: b
pop: true
b:
- match: '(?!b)' # bail on non-'b'
pop: true
- match: 'b'
scope: punctuation.separator.char.b
pop: true
would still not be able to generate ↓ (which seems like the equivalent to regex?, but there could be an easy mistake, so it's still meaningfully different )
It's very meaningfully different. With just the regex ac
wouldn't match and both characters marked as invalid. With those rules aa
is highlighted as valid, which is just incorrect. In fact the only way to actually do it correctly is with a branch point, so that when ST reaches the 2nd character it goes back and highlights the first one as invalid.
I've been bitten by the inability to use rules in regexes, requiring to fallback to a composition of literal regexes https://github.com/BenjaminSchaaf/sbnf/issues/12, also read a couple of more tangenially related issues https://github.com/BenjaminSchaaf/sbnf/issues/14 https://github.com/BenjaminSchaaf/sbnf/issues/4
But still not very clear on what the best solution to the issue of avoiding some repetition, so I to though that maybe I could provide an example and ask for advice re. how to best deal with the situation:
Lets say I'd like to define the rules for number syntax in the following relatively clean manner
(ideally I'd have a function with a list of 3 parameters
hex
,octal
,binary
that would do the rest, but that's a separate issue)The benefit of
sign
over[+-]
is that I don't have to repeat scope definition in every ruleSame with the
base
function(
d-hex
etc. ared1-hex d1-hex-*
, where finallyd1-hex
is a primitive regex of'[0-9a-fA-F]'
andd1-hex-
is the same with an_
(although_
aren't even allowed in rule names, so I can't express that properly in the name, but that's yet another issue), all with proper scopes)But then this doesn't work since the rules create different syntaxes instead of a combined regex, so they'd fail when a
0b1 111
is space-separated, and as I've read in another issue, there is no way around matching consecutive symbols, you need regexesSo I need to have a single regex, but then rules are not allowed inside, so I'd need to create primitive regexes and store them in CLAUSES as global vars instead of rules since I can't fit rules in a regex. And then for every match in every number I'd have to repeat the group scope definitions :( Also, getting down to a list of primitive regexes isn't very ergonomic in more composable conditions
What is the best current/planned way to solve this issue?