Open bmahabirbu opened 5 hours ago
Yeah thanks for opening an issue on this, I noticed this too recently... I have a feeling the behaviour in llama.cpp changed
I would favour os.execvp because it's one less process
Podman does not do anything with stderror, just passes it back to the caller.
Does your new exec_args work?
My guess would be, upstream llama.cpp changed the debug/verbose output to be stdout rather than stderr again. But I haven't checked, so it's just a guess, this isn't the first time a change like this has happened:
https://github.com/containers/ramalama/pull/187
maybe we could try --log-disable again.
We may have to consider writing our own application against llama.cpp as a library at some point in the future. I'm not sure if llama.cpp intend on stabilising llama-cli, llama-server, they are considered "examples" in that project. We just don't have the capacity to do this right now.
We recently updated the version of llama.cpp to get @slp 's neat Kompute functionality in which was successfully upstreamed 😄
@bmahabirbu good analysis I see what you mean before we used to run python inside and outside the container so that would have altered behaviour a bit also. You could be right, haven't played around with this to check.
I am noticing when implementing llama-cpp-python-server we will probably have to use a little bit of python3 inside the container for that runtime again
In an older commit, Podman and llama-cli commands would be separated, causing the container's stderr to be redirected to null (without --debug), which worked. Analogous to this example
Now what is happening is essentially this
Which only redirects the stderr of the current runtime, not inside the container
I'm currently looking into exec_cmd to have the functionality similar to this below but I'm having a hard time figuring out what the best solution should be
I'm not sure if Podman has a way of redirecting the stderr inside the container or if should I play around with using subprocess.run vs os.execvp