When we start the process running llama-cpp-python, we provide a pipe for stderr and then promptly close it. This means if llama-cpp-python tries to write to stderr, a broken pipe exception is thrown, which for example happens if there is a prefix cache hit when processing a prompt.
19 is a quick fix for this but it's a bit icky b/c we're still breaking stderr.
What we need to do here is:
not provide a broken pipe for stderr
actually capture logs from llama-cpp-python so they end up in the same place as the logs from uvicorn (added in #18)
Some ideas for how to do this:
Contribute a fix back to llama-cpp-python that updates writes to stderr to write to a logger instead so the logger can be configured
Instead of just spawning the llama-cpp-python process, fork another process that itself spawns that process, captures stderr as the process runs and streams it to a log
When we start the process running llama-cpp-python, we provide a pipe for stderr and then promptly close it. This means if llama-cpp-python tries to write to stderr, a broken pipe exception is thrown, which for example happens if there is a prefix cache hit when processing a prompt.
19 is a quick fix for this but it's a bit icky b/c we're still breaking stderr.
What we need to do here is:
Some ideas for how to do this: