Closed bmahabirbu closed 2 weeks ago
Yeah thanks for opening an issue on this, I noticed this too recently... I have a feeling the behaviour in llama.cpp changed
I would favour os.execvp because it's one less process
Podman does not do anything with stderror, just passes it back to the caller.
Does your new exec_args work?
My guess would be, upstream llama.cpp changed the debug/verbose output to be stdout rather than stderr again. But I haven't checked, so it's just a guess, this isn't the first time a change like this has happened:
https://github.com/containers/ramalama/pull/187
maybe we could try --log-disable again.
We may have to consider writing our own application against llama.cpp as a library at some point in the future. I'm not sure if llama.cpp intend on stabilising llama-cli, llama-server, they are considered "examples" in that project. We just don't have the capacity to do this right now.
We recently updated the version of llama.cpp to get @slp 's neat Kompute functionality in which was successfully upstreamed π
@bmahabirbu good analysis I see what you mean before we used to run python inside and outside the container so that would have altered behaviour a bit also. You could be right, haven't played around with this to check.
I am noticing when implementing llama-cpp-python-server we will probably have to use a little bit of python3 inside the container for that runtime again
My guess would be, upstream llama.cpp changed the debug/verbose output to be stdout rather than stderr again. But I haven't checked, so it's just a guess, this isn't the first time a change like this has happened:
maybe we could try --log-disable again.
We may have to consider writing our own application against llama.cpp as a library at some point in the future. I'm not sure if llama.cpp intend on stabilising llama-cli, llama-server, they are considered "examples" in that project. We just don't have the capacity to do this right now.
We recently updated the version of llama.cpp to get @slp 's neat Kompute functionality in which was successfully upstreamed π
I tested this against the latest version of llama.cpp and having llama-cli command wrapped in quotes adding 2> dev/null and it still works as it once did!
Thats an interesting idea! I didn't know that llama-cli/llama-server is just example code. I'll check this out further!
Lately I've been playing around with getting vulkan to work inside a container on wsl2 so i'd love to test out Kompute as well!
@bmahabirbu if you've tested the latest version of llama.cpp works fine, if you can open a PR updating the commit id of llama.cpp we use to that one, we should be good to close this...
In an older commit, Podman and llama-cli commands would be separated, causing the container's stderr to be redirected to null (without --debug), which worked. Analogous to this example
Now what is happening is essentially this
Which only redirects the stderr of the current runtime, not inside the container
I'm currently looking into exec_cmd to have the functionality similar to this below but I'm having a hard time figuring out what the best solution should be
I'm not sure if Podman has a way of redirecting the stderr inside the container or if should I play around with using subprocess.run vs os.execvp