containers / ramalama

The goal of RamaLama is to make working with AI boring.
MIT License
280 stars 48 forks source link

Ramalama Isnt Directing Stderr correctly #431

Closed bmahabirbu closed 2 weeks ago

bmahabirbu commented 2 weeks ago

In an older commit, Podman and llama-cli commands would be separated, causing the container's stderr to be redirected to null (without --debug), which worked. Analogous to this example

podman run --rm  etc
llama-cli run --rm  etc   2>/dev/null

Now what is happening is essentially this

podman run  etc  llama-cli  etc  2>/dev/null

Which only redirects the stderr of the current runtime, not inside the container

I'm currently looking into exec_cmd to have the functionality similar to this below but I'm having a hard time figuring out what the best solution should be

podman run etc "llama-cli  etc 2>/dev/null" (llama-cli inside quotes)

I'm not sure if Podman has a way of redirecting the stderr inside the container or if should I play around with using subprocess.run vs os.execvp

def exec_cmd(args, stderr=True, debug=False):
    if debug:
        perror("exec_cmd: ", *args)

    if not stderr:
        # Redirecting stderr to /dev/null
        # Here we have our issue when stderr is False we are still getting stderr from the container
        with open(os.devnull, "w") as devnull:
            os.dup2(devnull.fileno(), sys.stderr.fileno())

    try:
        return os.execvp(args[0], args)
    except Exception:
        perror(f"os.execvp({args[0]}, {args})")
        raise
ericcurtin commented 2 weeks ago

Yeah thanks for opening an issue on this, I noticed this too recently... I have a feeling the behaviour in llama.cpp changed

ericcurtin commented 2 weeks ago

I would favour os.execvp because it's one less process

rhatdan commented 2 weeks ago

Podman does not do anything with stderror, just passes it back to the caller.

Does your new exec_args work?

ericcurtin commented 2 weeks ago

My guess would be, upstream llama.cpp changed the debug/verbose output to be stdout rather than stderr again. But I haven't checked, so it's just a guess, this isn't the first time a change like this has happened:

https://github.com/containers/ramalama/pull/187

maybe we could try --log-disable again.

We may have to consider writing our own application against llama.cpp as a library at some point in the future. I'm not sure if llama.cpp intend on stabilising llama-cli, llama-server, they are considered "examples" in that project. We just don't have the capacity to do this right now.

We recently updated the version of llama.cpp to get @slp 's neat Kompute functionality in which was successfully upstreamed πŸ˜„

ericcurtin commented 2 weeks ago

@bmahabirbu good analysis I see what you mean before we used to run python inside and outside the container so that would have altered behaviour a bit also. You could be right, haven't played around with this to check.

I am noticing when implementing llama-cpp-python-server we will probably have to use a little bit of python3 inside the container for that runtime again

bmahabirbu commented 2 weeks ago

My guess would be, upstream llama.cpp changed the debug/verbose output to be stdout rather than stderr again. But I haven't checked, so it's just a guess, this isn't the first time a change like this has happened:

#187

maybe we could try --log-disable again.

We may have to consider writing our own application against llama.cpp as a library at some point in the future. I'm not sure if llama.cpp intend on stabilising llama-cli, llama-server, they are considered "examples" in that project. We just don't have the capacity to do this right now.

We recently updated the version of llama.cpp to get @slp 's neat Kompute functionality in which was successfully upstreamed πŸ˜„

I tested this against the latest version of llama.cpp and having llama-cli command wrapped in quotes adding 2> dev/null and it still works as it once did!

Thats an interesting idea! I didn't know that llama-cli/llama-server is just example code. I'll check this out further!

Lately I've been playing around with getting vulkan to work inside a container on wsl2 so i'd love to test out Kompute as well!

ericcurtin commented 2 weeks ago

@bmahabirbu if you've tested the latest version of llama.cpp works fine, if you can open a PR updating the commit id of llama.cpp we use to that one, we should be good to close this...