Closed SouravKB closed 1 year ago
It is fine to assume a Perl regex matcher is used (because of the \1
backreference). Just use option -m
or %matcher
to specify a matcher, see the documentation.
However, there are some things to be aware of:
\1
applies globally to all pattern regex combined. If this is the first pattern in the list, then \1
is probably fine. But otherwise you may have to use a different index or better use named group captures.This is actually not really a RE/flex question or issue, but a general tokenization or regex question for which there are other venues to obtain advice.
what you want can be best done with the usual POSIX matcher of Flex and RE/flex by counting the number of opening quotes and matching the same number at the closing.
That's what I was trying to do here (I missed to mention that I was using reflex matcher). In my first comment above, I have provided an example lexer specification that almost works. But it is not fully correct. So I needed help.
It is also true that I should have asked it in some Q&A forum. But the question isn't a general regex question. It is specific to the abilities of reflex matcher. Hence I decided to ask here.
I want to match the syntax
('*)".*?"\1
. It should match"foo"
,''"bar"''
, but should not match''"baz"'
. Is there a correct way to match this using reflex matcher? The nearest I could achieve was the following lexer:The problem with this is that, it matches only valid UTF8 inside strings. But I want anything inside string to be matched. I considered three workarounds. But all three seems to have some issues.
skip()
. This skips all characters till it reaches delimiter. But in the process, it consumes all the string content. I don't get to keep them..*?\"
instead of[^"]*
. This works for every properly terminated strings. But gets the lexer jammed if the string is not terminated..
. Since.
is synchronizing, it can even match invalid UTF8 sequences. But this approach feels way too slow.So is there any better approach for solving this?