Open mwootten opened 3 months ago
This is probably a side effect of interregular, which has implicit anchoring: https://github.com/MegaIng/interegular/issues/10
Bringing Outlines into compliance with the JSON schema spec here would unfortunately be a breaking change, as it's likely that at least some users have been expecting the patterns they wrote to be exact matches rather than partial matches.
Possibly, however this feature didn't even work until 2 months ago https://github.com/outlines-dev/outlines/commit/60e89f5706e3d0f9837e271e04a39fb6e81d92df
A good approach might be to implement this, and provide a warning describing the new behavior and mitigation strategies if a pattern is used.
Describe the issue as clearly as possible:
The JSON schema specification states, of the
pattern
keyword:This means that, for instance,
{"type": "string", "pattern": "abcd"}
should match"before abcd after"
. However, the regular expression Outlines currently generates acts as if the pattern is implicitly anchored, and that schema is interpreted as if it was{"type": "string", "pattern": "^abcd$"}
Steps/code to reproduce the bug:
Expected result:
Error message:
No response
Outlines/Python version information:
Version information
Context for the issue:
Bringing Outlines into compliance with the JSON schema spec here would unfortunately be a breaking change, as it's likely that at least some users have been expecting the patterns they wrote to be exact matches rather than partial matches.