Closed aw632 closed 4 months ago
Interegular implicitly has anchors at the start and end, i.e. it behaves like re.fullmatch
. This makes these symbols unnecessary and they are probably never going to be added. I might at some point implement them in the parser as more or less noops.
Interegular implicitly has anchors at the start and end, i.e. it behaves like
re.fullmatch
. This makes these symbols unnecessary and they are probably never going to be added. I might at some point implement them in the parser as more or less noops.
Thanks for the clarification, this will help me narrow down my issue. Would you be able to point out where the implicit anchoring is performed? I looked at https://github.com/MegaIng/interegular/blob/master/interegular/fsm.py but wasn't able to find it
It's not performed anywhere, which is why it's implicit. The fsm class accepts a string if it exactly matches the state transitions produced by the regex, and there are no state transitions for an arbitrary prefix/postfix.
But also, if you want to mess with the FSM directly, interegular is the wrong library for you. Use greenery instead. Yes, greenery by default doesn't support lookahead/-behind, and there are reasons for that. I am using a hack to support them within the domain I care about. If you want, you could reimplemented the hack on top of greenery's public FSM module.
My current regex use case requires me to have ^ and $, otherwise the regex will match extraneous strings. Here's an example: https://regex101.com/r/JFtuVu/1
This is an example of the regex I'm trying to match/create an FSM out of. It needs to start with a string (APPLE), followed by some other regex (here, regex for a JSON schema), and nothing else OR if it does not start with APPLE then it can be any string.
It would be really useful if interegular could support ^ and $, as these are pretty foundational regex components and without it (and without lookaheads as well) I can't use this kind of string with interegular.