Open v4u6h4n opened 6 months ago
Same here, Radeon Pro W5700
llava-v1.5-7b-q4.llamafile --version
llamafile v0.8.0
relevant perhaps: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
Hey :-)
Did it fix anything for you?
Doesn't seem to have, but I'm not sure that it install properly.
I was able to make it work by changing the base image of my container to FROM nvcr.io/nvidia/pytorch:24.03-py3
That base image is gigantic (~14.6 GB), so probably the best option would be to use docker multi stage build to extract nvcc and its dependencies.
@fcrisciani Unfortunately I am enough of an amateur linux user that I don't know what that means lol but happy you got it working ;-)
I was referring to creating a docker image (https://docs.docker.com/engine/install/)
My Dockerfile looks like:
FROM nvcr.io/nvidia/pytorch:24.03-py3
RUN apt update && apt install -y wget
COPY start.sh /
RUN chmod +x /start.sh
CMD /start.sh
the start file looks like:
#!/bin/bash
echo "Download llamafile..."
wget https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4.llamafile?download=true -O /tmp/llava-v1.5-7b-q4.llamafile
echo "Start serving the llamafile"
chmod +x /tmp/llava-v1.5-7b-q4.llamafile
/tmp/llava-v1.5-7b-q4.llamafile -ngl 999 --gpu nvidia --nobrowser --host 0.0.0.0
you can: 1) install docker 2) create a folder with the 2 files above: Dockerfile and start.sh 3) build the container image: docker build -t my_gpu_test . 4) run it: docker run --rm -it --gpus=all my_gpu_test
@fcrisciani it looks like you may be suggesting a fix that works in your case with an nvidia gpu, but the OP issue relates to an amd gpu problem. Considering the use-case of llamafile being a single file LLM that utilizes you gpu, wouldn't a docker install be a big overkill for this problem, and would your fix even address the amd side of things?
conceptually the solution is the same, my understanding is that for nvidia GPU nvcc is the dependency, for AMD instead is hipcc. If you properly install on your machine all the dependencies it should work without using docker. I used docker just to create an image with all the dependencies backed in so that I can move it on different machines without manually installing all the dependencies but it's a user preference
I'm also seeing no GPU offload. When launching I saw it mention you need to pass -ngl 9999 then in the docs/git page it says -ngl 999 (Which is it? I've tried both, with no difference.)
I went digging and tried --gpu nvidia -ngl 999 which now gives me
import_cuda_impl: initializing gpu module...
get_nvcc_path: note: nvcc.exe not found on $PATH
get_nvcc_path: note: /opt/cuda/bin/nvcc.exe does not exist
link_cuda_dso: note: dynamically linking /C/users/user/.llamafile/v/0.8.5/ggml-cuda.dll
link_cuda_dso: warning: library not found: failed to load library
fatal error: support for --gpu nvidia was explicitly requested, but it wasn't available
WAY up in the output
{"function":"server_params_parse","level":"WARN","line":2424,"msg":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1,"tid":"11820704","timestamp":1727726156}
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
@nPHYN1T3
I had to focus attention on another project, so I didn't get to follow this up myself, but if you do, you could look into dependencies, which may not be readily documented by this project, that might fix the issue. If I get time this month and give it a try before you do I'll post here again.
I didn't see anything about dependencies but cuda and anything else it might need should be installed as I run ollama on bare metal. The docs are rather disjointed. I saw a few times talking about consult the README.md in a context that didn't make sense since there isn't one when you grab a single "containerized" file.
Hey everyone, awesome project :-) am having fun playing around with it, but I think my GPU isn't being utilised. I can see my CPU maxing out, and not seeing much of a change in my GPU usage, just wondering what the issue is. Here's the output in terminal:
...and my system specs: