Closed bobcatfish closed 4 months ago
If you run models directly using llama-cpp-python's webserver, helpful output is written to stdout and stderr about what's going on (and we even capture that output to make sure things start up correctly) but once the model is running, nothing is capturing that output and writing it anywhere.
This makes it difficult to debug when things go wrong (like #3), so we should capture this output as logs and make it available.
If you run models directly using llama-cpp-python's webserver, helpful output is written to stdout and stderr about what's going on (and we even capture that output to make sure things start up correctly) but once the model is running, nothing is capturing that output and writing it anywhere.
This makes it difficult to debug when things go wrong (like #3), so we should capture this output as logs and make it available.