slow regex generation - Githubissues

          Coming from pure `pip install outlines` (it didn't prompt me to install anything else) it took 1hr+ to generate 512 tokens constrained to a `r"```latex(.*?|\n)```"` regex. The FSM compiled to 100% fairly fast though. This was a 2B-4bit model on a 3090, of which all 24GB of VRAM were filled during generation (my prompt is like 20 tokens).

I had a similar experience in the past, which I "solved" by using HuggingFace's TGI for structured generation. It was a lot faster, which is weird because I thought they used Outlines under the hood.

Should I have went with an inference engine like pip install outlines[vllm]?

Originally posted by @ahmed-moubtahij in https://github.com/dottxt-ai/outlines/issues/1149#issuecomment-2354062820

dottxt-ai / outlines

slow regex generation #1167