Open alabamenhu opened 3 years ago
Effectively, the problem is the fact that there are two tokens, but only one way to set precedence. Two solutions I can see would be either
is eager
, or to allow similar degrees of granularity, more-eager
, as-eager
, less-eager
.*
whatever for default. So in the case of the html example, we'd say circumfix:«<html> </html>» is tighter(*, &infix«<»)
I'd lean to the latter, although admittedly someone who wanted to set the initial and terminal tokens with different precedent types might have an odd looking definition (is tighter(foo,*) is looser(*,bar)
), but such degrees of specificity would likely be rare.
There may be another issue here, which is that as described above, I think that there might be a problem with the tokenizer. It seems to me that longest token matching should treat </html>
as a single token rather than backtracking to looking at it as separate <
and /
tokens. Either that or the problem isn't being sufficiently described in this ticket yet.
Pm
There may be another issue here, which is that as described above, I think that there might be a problem with the tokenizer. It seems to me that longest token matching should treat
</html>
as a single token rather than backtracking to looking at it as separate<
and/
tokens. Either that or the problem isn't being sufficiently described in this ticket yet.
That is a definitely possibility I thought of. An extreme test case might be an idea I had toyed with the idea back in the day:
sub circumfix:<¿ ?> (\x) { so x }
sub circumfix:<¡ !> (\x) { not x }
These would actually conflict with prefix tokens !
and ?
. If the issue revolved around LTM, do you think that that would be possible without needing to separately set the precedence of the terminal !
/ ?
?
Problem
While it is possible to adjust the precedence of a circumfix operator by using the
is tighter()
, that only applies for the precedence of the start token. Once inside the circumfix, the terminal token is only considered a possibility after infixes and postfixes have been considered.Details
This is problematic, as it means that the terminal token can not begin with any of the same characters as an infix or postfix, and many of these that may be desired (and logically make sense) cannot be used. For instance, the following sequences are, in effect, verboten as a terminal circumfix onset:
, : Z X ff | ^ / + - * ~ o x ? ! < [ ( {
as well as a few other two and three letter sequences. If creating a circumfix operator where there is also potentially a custom infix/postfix operator, this could be even more limiting.As a practical example, see comments on reddit. Creating a
circumfix:«<html> </html>»
(which a user may logically wish to do) fails, because the current parse is:This is because
&infix:«<»
has tighter precedence. Usingis tighter:(&infix:«<»)
does not change the parse.