Open ProgrammingLife opened 4 months ago
I'm also having a similar problem when trying to install model 70b on arch linux
At first I have the same issue as the one in #86 and then the system crashes cause it maxed out the memory usage.
But after a reboot and running the command again (./run.sh --model 70b
) I get this on repeat:
llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-api-1 | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:127: UserWarning: Field "model_alias" has conflict with protected namespace "model_".
llama-gpt-api-1 |
llama-gpt-api-1 | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ('settings_',)`.
llama-gpt-api-1 | warnings.warn(
llama-gpt-api-1 | llama.cpp: loading model from /models/llama-2-70b-chat.bin
llama-gpt-api-1 | llama_model_load_internal: warning: assuming 70B model based on GQA == 8
llama-gpt-api-1 | llama_model_load_internal: format = ggjt v3 (latest)
llama-gpt-api-1 | llama_model_load_internal: n_vocab = 32001
llama-gpt-api-1 | llama_model_load_internal: n_ctx = 4096
llama-gpt-api-1 | llama_model_load_internal: n_embd = 8192
llama-gpt-api-1 | llama_model_load_internal: n_mult = 7168
llama-gpt-api-1 | llama_model_load_internal: n_head = 64
llama-gpt-api-1 | llama_model_load_internal: n_head_kv = 8
llama-gpt-api-1 | llama_model_load_internal: n_layer = 80
llama-gpt-api-1 | llama_model_load_internal: n_rot = 128
llama-gpt-api-1 | llama_model_load_internal: n_gqa = 8
llama-gpt-api-1 | llama_model_load_internal: rnorm_eps = 5.0e-06
llama-gpt-api-1 | llama_model_load_internal: n_ff = 28672
llama-gpt-api-1 | llama_model_load_internal: freq_base = 10000.0
llama-gpt-api-1 | llama_model_load_internal: freq_scale = 1
llama-gpt-api-1 | llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama-gpt-api-1 | llama_model_load_internal: model size = 70B
llama-gpt-api-1 | llama_model_load_internal: ggml ctx size = 0.07 MB
llama-gpt-api-1 | error loading model: llama.cpp: tensor 'layers.26.ffn_norm.weight' is missing from model
llama-gpt-api-1 | llama_load_model_from_file: failed to load model
llama-gpt-api-1 | Traceback (most recent call last):
llama-gpt-api-1 | File "<frozen runpy>", line 198, in _run_module_as_main
llama-gpt-api-1 | File "<frozen runpy>", line 88, in _run_code
llama-gpt-api-1 | File "/app/llama_cpp/server/__main__.py", line 46, in <module>
llama-gpt-api-1 | app = create_app(settings=settings)
llama-gpt-api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-api-1 | File "/app/llama_cpp/server/app.py", line 317, in create_app
llama-gpt-api-1 | llama = llama_cpp.Llama(
llama-gpt-api-1 | ^^^^^^^^^^^^^^^^
llama-gpt-api-1 | File "/app/llama_cpp/llama.py", line 328, in __init__
llama-gpt-api-1 | assert self.model is not None
llama-gpt-api-1 | ^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-api-1 | AssertionError
llama-gpt-api-1 exited with code 1
llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available...
I'm unfortunately having the same issues
Why do I get all those errors? Arch Linux.