S05 on Regexes and Rules has a Longest-token matching section. The section has a paragraph on tie breaking which starts with:
"However, if two alternatives match at the same length, the tie is broken ..."
A later sentence adds a tie breaking rule:
If the alternatives are in the same grammar file, the textually earlier alternative takes precedence.
It is believed that there are currently no tests for this tie breaking rule and an IRC discussion with @jnthn indicates that the rule is intended to be implemented by the current Rakudo. The discussion starts an example related to tie breaking for '|' alternation but jnthn also mentions alternation with protoregexes. The tests for '|' alternation would be expected in S05-metasyntax/longest-alternative.t and the proto tests might belong in S05-metasyntax/proto-token-ltm.t. There do not currently appear to be tests for either usage.
I include preliminary examples of each case for review and comment, if needed, before being formalized for test file pull requests.
The example on IRC is for '|' alternation and is included below. The example is also expected to be the base of a related RT to be posted soon.
use Test;
grammar text-order {
token alt-na_1 { <n> | <na_1> };
token alt-na_2 { <n> | <na_2> };
token n { <digit>+ }
token na_1 { <+alpha +digit> + }
token na_2 { <+alpha +[0..9]> + }
}
is text-order.parse('1', :rule<alt-na_1>).keys[0], 'n',
'Match first textual rule OK';
is text-order.parse('1', :rule<alt-na_2>).keys[0], 'n',
'Match second textual rule fail';
I have come up with another example for protoregexes:
# https://design.perl6.org/S05.html#Longest-token_matching says:
# If the alternatives are in the same grammar file, the
# textually earlier alternative takes precedence.
#
# Here the first declared multi wins which is hopefully the same thing
# and not an MRO tiebreak.
use Test;
grammar text-order {
proto token alt-na_1 {*}
multi token alt-na_1:sym<n> { <n> }
multi token alt-na_1:sym<na_1> { <na_1> }
proto token alt-na_2 {*}
multi token alt-na_2:sym<n> { <n> }
multi token alt-na_2:sym<na_2> { <na_2> }
proto token alt-na_3 {*}
multi token alt-na_3:sym<na_1> { <na_1> }
multi token alt-na_3:sym<n> { <n> }
token n { <digit>+ }
token na_1 { <+alpha +digit> + }
token na_2 { <+alpha +[0..9]> + }
}
is text-order.parse('1', :rule<alt-na_1>).keys[0], 'n',
'Match first multi';
is text-order.parse('1', :rule<alt-na_2>).keys[0], 'n',
'Match first multi with different second';
is text-order.parse('1', :rule<alt-na_3>).keys[0], 'na_1',
'Match first multi declared second';
S05 on Regexes and Rules has a Longest-token matching section. The section has a paragraph on tie breaking which starts with:
A later sentence adds a tie breaking rule:
It is believed that there are currently no tests for this tie breaking rule and an IRC discussion with @jnthn indicates that the rule is intended to be implemented by the current Rakudo. The discussion starts an example related to tie breaking for '|' alternation but jnthn also mentions alternation with protoregexes. The tests for '|' alternation would be expected in S05-metasyntax/longest-alternative.t and the proto tests might belong in S05-metasyntax/proto-token-ltm.t. There do not currently appear to be tests for either usage.
I include preliminary examples of each case for review and comment, if needed, before being formalized for test file pull requests.
The example on IRC is for '|' alternation and is included below. The example is also expected to be the base of a related RT to be posted soon.
I have come up with another example for protoregexes: