70b and 13b do not work (kubernetes install)

Hello, I'd like to reopen this issue (#12) as I've encountered an identical error during the Kubernetes installation process.

I made adjustments to the file deploy/kubernetes/kustomization.yaml to change the model name as follows:

configMapGenerator:
- name: llama-gpt
  literals:
  - DEFAULT_MODEL="/models/llama-2-13b-chat.bin"

Then when attempting to deploy, kubectl apply -k deploy/kubernetes/. -n llama, seems the llama-gpt-api pod keep restarting, I got the following error:

python3 setup.py develop
/usr/local/lib/python3.11/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!
        ********************************************************************************
        Please avoid running ``setup.py`` and ``easy_install``.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.
        See https://github.com/pypa/setuptools/issues/917 for details.
        ********************************************************************************
!!
  easy_install.initialize_options(self)
--------------------------------------------------------------------------------
-- Trying 'Ninja' generator
--------------------------------
---------------------------
----------------------
-----------------
------------
-------
--
Not searching for unused variables given on the command line.
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.
  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.
-- The C compiler identification is GNU 10.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- The CXX compiler identification is GNU 10.2.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done (0.3s)
-- Generating done (0.0s)
-- Build files have been written to: /app/_cmake_test_compile/build
--
-------
------------
-----------------
----------------------
---------------------------
--------------------------------
-- Trying 'Ninja' generator - success
--------------------------------------------------------------------------------
Configuring Project
  Working directory:
    /app/_skbuild/linux-aarch64-3.11/cmake-build
  Command:
    /usr/local/lib/python3.11/site-packages/cmake/data/bin/cmake /app -G Ninja --no-warn-unused-cli -DCMAKE_INSTALL_PREFIX:PATH=/app/_skbuild/linux-aarch64-3.11/cmake-install -DPYTHON_VERSION_STRING:STRING=3.11.4 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/usr/local/lib/python3.11/site-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/usr/local/bin/python3 -DPYTHON_INCLUDE_DIR:PATH=/usr/local/include/python3.11 -DPYTHON_LIBRARY:PATH=/usr/local/lib/libpython3.11.so -DPython_EXECUTABLE:PATH=/usr/local/bin/python3 -DPython_ROOT_DIR:PATH=/usr/local -DPython_FIND_REGISTRY:STRING=NEVER -DPython_INCLUDE_DIR:PATH=/usr/local/include/python3.11 -DPython_NumPy_INCLUDE_DIRS:PATH=/usr/local/lib/python3.11/site-packages/numpy-1.25.1-py3.11-linux-aarch64.egg/numpy/core/include -DPython3_EXECUTABLE:PATH=/usr/local/bin/python3 -DPython3_ROOT_DIR:PATH=/usr/local -DPython3_FIND_REGISTRY:STRING=NEVER -DPython3_INCLUDE_DIR:PATH=/usr/local/include/python3.11 -DPython3_NumPy_INCLUDE_DIRS:PATH=/usr/local/lib/python3.11/site-packages/numpy-1.25.1-py3.11-linux-aarch64.egg/numpy/core/include -DCMAKE_BUILD_TYPE:STRING=Release
Not searching for unused variables given on the command line.
-- The C compiler identification is GNU 10.2.1
-- The CXX compiler identification is GNU 10.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done (0.3s)
-- Generating done (0.0s)
-- Build files have been written to: /app/_skbuild/linux-aarch64-3.11/cmake-build
[1/2] Generating /app/vendor/llama.cpp/libllama.so
make[1]: Entering directory '/app/vendor/llama.cpp'
I llama.cpp build info:
I UNAME_S:  Linux
I UNAME_P:  unknown
I UNAME_M:  aarch64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -DGGML_USE_K_QUANTS -mcpu=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -mcpu=native
I LDFLAGS:
I CC:       cc (Debian 10.2.1-6) 10.2.1 20210110
I CXX:      g++ (Debian 10.2.1-6) 10.2.1 20210110
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -mcpu=native -c llama.cpp -o llama.o
cc  -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -DGGML_USE_K_QUANTS -mcpu=native   -c ggml.c -o ggml.o
cc -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -DGGML_USE_K_QUANTS -mcpu=native   -c -o k_quants.o k_quants.c
k_quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:
k_quants.c:1273:36: warning: missing braces around initializer [-Wmissing-braces]
 1273 |         const int16x8x2_t mins16 = {vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(mins))), vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(mins)))};
      |                                    ^
      |                                     {                                                                                                      }
k_quants.c: In function ‘ggml_vec_dot_q6_K_q8_K’:
k_quants.c:3371:38: warning: missing braces around initializer [-Wmissing-braces]
 3371 |         const int16x8x2_t q6scales = {vmovl_s8(vget_low_s8(scales)), vmovl_s8(vget_high_s8(scales))};
      |                                      ^
      |                                       {                                                            }
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -mcpu=native -shared -fPIC -o libllama.so llama.o ggml.o k_quants.o
make[1]: Leaving directory '/app/vendor/llama.cpp'
[1/2] Install the project...
-- Install configuration: "Release"
-- Installing: /app/_skbuild/linux-aarch64-3.11/cmake-install/llama_cpp/libllama.so
copying _skbuild/linux-aarch64-3.11/cmake-install/llama_cpp/libllama.so -> llama_cpp/libllama.so
running develop
/usr/local/lib/python3.11/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!
        ********************************************************************************
        Please avoid running ``setup.py`` and ``easy_install``.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.
        See https://github.com/pypa/setuptools/issues/917 for details.
        ********************************************************************************
!!
  easy_install.initialize_options(self)
/usr/local/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!
        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.
        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************
!!
  self.initialize_options()
running egg_info
writing llama_cpp_python.egg-info/PKG-INFO
writing dependency_links to llama_cpp_python.egg-info/dependency_links.txt
writing requirements to llama_cpp_python.egg-info/requires.txt
writing top-level names to llama_cpp_python.egg-info/top_level.txt
reading manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
adding license file 'LICENSE.md'
writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
running build_ext
Creating /usr/local/lib/python3.11/site-packages/llama-cpp-python.egg-link (link to .)
llama-cpp-python 0.1.77 is already the active version in easy-install.pth
Installed /app
Processing dependencies for llama-cpp-python==0.1.77
Searching for diskcache==5.6.1
Best match: diskcache 5.6.1
Processing diskcache-5.6.1-py3.11.egg
Adding diskcache 5.6.1 to easy-install.pth file
Using /usr/local/lib/python3.11/site-packages/diskcache-5.6.1-py3.11.egg
Searching for numpy==1.25.1
Best match: numpy 1.25.1
Processing numpy-1.25.1-py3.11-linux-aarch64.egg
Adding numpy 1.25.1 to easy-install.pth file
Installing f2py script to /usr/local/bin
Installing f2py3 script to /usr/local/bin
Installing f2py3.11 script to /usr/local/bin
Using /usr/local/lib/python3.11/site-packages/numpy-1.25.1-py3.11-linux-aarch64.egg
Searching for typing-extensions==4.7.1
Best match: typing-extensions 4.7.1
Adding typing-extensions 4.7.1 to easy-install.pth file
Using /usr/local/lib/python3.11/site-packages
Finished processing dependencies for llama-cpp-python==0.1.77
Initializing server with:
Batch size: 2096
Number of CPU threads: 4
Number of GPU layers: 0
Context window: 4096
/usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:126: UserWarning: Field "model_alias" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ('settings_',)`.
  warnings.warn(
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/app/llama_cpp/server/__main__.py", line 46, in <module>
    app = create_app(settings=settings)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/llama_cpp/server/app.py", line 313, in create_app
    llama = llama_cpp.Llama(
            ^^^^^^^^^^^^^^^^
  File "/app/llama_cpp/llama.py", line 308, in __init__
    raise ValueError(f"Model path does not exist: {model_path}")
ValueError: Model path does not exist: /models/llama-2-13b-chat.bin
Exception ignored in: <function Llama.__del__ at 0xffff9928f060>
Traceback (most recent call last):
  File "/app/llama_cpp/llama.py", line 1507, in __del__
    if self.model is not None:
       ^^^^^^^^^^
AttributeError: 'Llama' object has no attribute 'model'

Note: Works fine with the default model llama-2-7b-chat.bin.

getumbrel / llama-gpt

70b and 13b do not work (kubernetes install) #85