cheshire-cat-ai / llama-local

MIT License
6 stars 1 forks source link

error start container #3

Open Alessandro-Vezzoli opened 10 months ago

Alessandro-Vezzoli commented 10 months ago

i want to try this with cheshire-cat but i'm having some issue, first i had to edit the file dockerfile adding starlette-context as a dependency for python as i was getting an error

ARG CUDA_IMAGE="12.1.1-devel-ubuntu22.04"
FROM nvidia/cuda:${CUDA_IMAGE}

# We need to set the host to 0.0.0.0 to allow outside access
ENV HOST 0.0.0.0

RUN apt-get update && apt-get upgrade -y \
    && apt-get install -y git build-essential \
    python3 python3-pip gcc wget \
    ocl-icd-opencl-dev opencl-headers clinfo \
    libclblast-dev libopenblas-dev \
    && mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd

COPY . .

# setting build related env vars
ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1

# Install depencencies
RUN python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette pydantic-settings starlette-context

# Install llama-cpp-python (build with cuda)
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

# Run the server
CMD python3 -m llama_cpp.server

after this i'm getting this error, what is the problem ?

2023-12-20 17:55:47 Traceback (most recent call last):
2023-12-20 17:55:47   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
2023-12-20 17:55:47     return _run_code(code, main_globals, None,
2023-12-20 17:55:47   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
2023-12-20 17:55:47     exec(code, run_globals)
2023-12-20 17:55:47   File "/usr/local/lib/python3.10/dist-packages/llama_cpp/server/__main__.py", line 96, in <module>
2023-12-20 17:55:47     app = create_app(settings=settings)
2023-12-20 17:55:47   File "/usr/local/lib/python3.10/dist-packages/llama_cpp/server/app.py", line 389, in create_app
2023-12-20 17:55:47     llama = llama_cpp.Llama(
2023-12-20 17:55:47   File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 962, in __init__
2023-12-20 17:55:47     self._n_vocab = self.n_vocab()
2023-12-20 17:55:47   File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 2266, in n_vocab
2023-12-20 17:55:47     return self._model.n_vocab()
2023-12-20 17:55:47   File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 251, in n_vocab
2023-12-20 17:55:47     assert self.model is not None
2023-12-20 17:55:47 AssertionError
2023-12-20 17:55:49 ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
2023-12-20 17:55:49 ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
2023-12-20 17:55:49 ggml_init_cublas: found 1 CUDA devices:
2023-12-20 17:55:49   Device 0: NVIDIA GeForce GTX 1050 Ti, compute capability 6.1
2023-12-20 17:55:49 gguf_init_from_file: invalid magic characters 'tjgg'
2023-12-20 17:55:49 error loading model: llama_model_loader: failed to load model from /models/llama-2-7b-chat.ggmlv3.q2_K.bin

Thanks