When we start the process running llama-cpp-python, we provide a pipe for stderr and then promptly close it. This means if llama-cpp-python tries to write to stderr, a broken pipe exception is thrown, which for example happens if there is a prefix cache hit when processing a prompt.

19 is a quick fix for this but it's a bit icky b/c we're still breaking stderr.

What we need to do here is:

not provide a broken pipe for stderr
actually capture logs from llama-cpp-python so they end up in the same place as the logs from uvicorn (added in #18)

Some ideas for how to do this:

Contribute a fix back to llama-cpp-python that updates writes to stderr to write to a logger instead so the logger can be configured
Instead of just spawning the llama-cpp-python process, fork another process that itself spawns that process, captures stderr as the process runs and streams it to a log

GoogleCloudPlatform / localllm

Capture (stderr) logs from llama-cpp-python cleanly #20

19 is a quick fix for this but it's a bit icky b/c we're still breaking stderr.