Closed khalo-sa closed 2 months ago
might be some problems with outlines printing debug info, can you try lmfe instead? pass "guided_decoding_backend": ""lm-format-enforcer"
in your request.
outlines needs to compile the regex, and the process is very slow, lmfe doesn't. On the other hand, once compiled, outlines is faster than lmfe. So outlines is only suitable for the case where the same regex is applied to all subsequent requests.
Hey @sgsdxzy , you were right. Only if APHRODITE_LOG_LEVEL is set to DEBUG am I getting the mentioned excessive logs from Outlines. But I wonder whether the sheer amount of logs makes the server crash/hang or if another side effect is responsible for it? Either way, not setting APHRODITE_LOG_LEVEL solved it for me.
Your current environment
🐛 Describe the bug
Hey guys, I'm running aphrodite in Docker with the following compose file:
Once the server has started, I can start sending normal generation requests without issues. However, as soon as I send the first request with guided_json or guided_regex param, the request never terminates, and I'm getting crazy logs that are so long that I attached them as a file here: aphrodite-startup-bug.txt
As you can see in the logs, after about 10 minutes the crash logs disappear and I can start sending the same type of request (with guided_json/guided_regex) again without the error. However, I think I have experienced the same error spontaneosly again later on. As I'm dependent on structured generation in my projects, I currently cannot rely on Aphrodite which is a real bummer.
While in the error state, the server is stuck with CPU Utilization at 100%, GPU Utilization at 0%, and doesn't accept new generation requests of any kind.
I'm getting the same behavior with three exl2 models that I tried (llama3-70b, mistral7b-v3, qwen2-7b), therefore, I am assuming this is related to exl2. Though, have a feeling that I had a similar behavior with gguf in the past, which probably made me switch to exl2, but I would have to check that again to say for certain.