Open fmmoret opened 9 months ago
outlines.fsm.json_schema.build_regex_from_object(json.dumps(Example.model_json_schema()))
'\\{[\\n ]*"a"[\\n ]*:[\\n ]*"(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,60}"[\\n ]*,[\\n ]*"b"[\\n ]*:[\\n ]*\\[[\\n ]*(("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,20}")(,[\\n ]*("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,20}")){0,4})?[\\n ]*\\][\\n ]*,[\\n ]*"c"[\\n ]*:[\\n ]*"(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,200}"[\\n ]*,[\\n ]*"d"[\\n ]*:[\\n ]*\\[[\\n ]*(("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,100}")(,[\\n ]*("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,100}")){0,4})?[\\n ]*\\][\\n ]*,[\\n ]*"e"[\\n ]*:[\\n ]*\\[[\\n ]*(("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,20}")(,[\\n ]*("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,20}")){0,14})?[\\n ]*\\][\\n ]*,[\\n ]*"f"[\\n ]*:[\\n ]*\\[[\\n ]*(("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,30}")(,[\\n ]*("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,30}")){0,14})?[\\n ]*\\][\\n ]*\\}'
Yes, this is a very complex pattern and results in over 5,000 FSM states, each of which contains a set of valid tokens generated by traversing the FSM.
I agree 100% with the idea of splitting unions into multiple FSMs. For concatenated patterns it might become more complicated.
Related: https://github.com/outlines-dev/outlines/discussions/640
Is there an elegant way you can picture doing so while still being able to use the vLLM & pydantic integrations?
It's quite complex, I've been thinking about this problem for a bit. I'll ping you when I post the write-up I've been working on. (Concatenation and union operations between RegexFSM
are also critical to performance of CFGFSM
, but it relates here as well).
I see a few ways we could dramatically reduce the memory that these FSMs take:
Which solution we should prioritize is very much an empirical question at this point.
Is there something tangential out there we could use instead of fsms?
This guy is claiming 10,000-100,000 times faster json constraining. https://twitter.com/jrysana/status/1765687350363906278
Is there something tangential out there we could use instead of fsms?
There are a few things to try before completely replacing the DFA logic. For instance, how many of those transitions correspond to "allow all tokens and go to the same state"? If many, we could treat this as a special case and it would decrease the memory requirements dramatically. We could store ranges instead of token ids as well.
This guy is claiming 10,000-100,000 times faster json constraining. https://twitter.com/jrysana/status/1765687350363906278
We are yet to see the code. Anyway that's compilation time, and a few things can be done to improve this in Outlines we just didn't have the time to get around to it.
Describe the issue as clearly as possible:
This uses up a huge amount of RAM and hangs for a long time -- 1600s on my machine with 32GB ram, current gen gaming laptop. Reducing some key constraints like
constr(max_length=100)
ind
reduces memory usage & initial computation time in exponential fashion.What are the recommended sets of workarounds? Is there a cacheless / lazy eval mode? Could we choose to split the JSON FSM into multiple & concat to reduce explosion at all of the control flow points in the regex?
Steps/code to reproduce the bug:
Expected result:
Error message:
No response
Outlines/Python version information:
Version information
Context for the issue:
No response