Open Offlinedsad opened 1 year ago
Try lowering the n_layers in the settings.
Same here. Lowered n_layers all the way down to 1 and still nothing. Log is not showing any errors. Driver 535.54.03
UPDATE: just tried the CPU version. Same result. Not working
@edgar971 Ya same here. Other things working with the nvidia driver fine, but your container won't start at all. No errors in the log. The log doesn't even update when trying to start the container.
All I get is:
==========
== CUDA ==
==========
CUDA Version 12.2.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
/models/llama-2-7b-chat.bin model found.
Initializing server with:
Batch size: 2096
Number of CPU threads: 48
Context window: 4096
The thing spins for a second in unraid then goes back to stopped. Tried dropping layers to 1 etc. No luck.
Similar issue here with the CPU version, appears to start, logs indicate that it's running but the docker shuts down. If the model doesn't exist it downloads it fresh then ends in the same state.
You may be able to resolve this warning by setting
model_config['protectednamespaces'] = ('settings',)`.
warnings.warn(
llama.cpp: loading model from /models/llama-2-7b-chat.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 4096
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_head_kv = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: n_gqa = 1
llama_model_load_internal: rnorm_eps = 5.0e-06
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
/models/llama-2-7b-chat.bin model found.
Initializing server with:
Batch size: 2096
Number of CPU threads: 40
Context window: 4096
ai-chatbot-starter@0.1.0 start next start
ready - started server on 0.0.0.0:3000, url: http://localhost:3000 /models/llama-2-7b-chat.bin model found. Initializing server with: Batch size: 2096 Number of CPU threads: 40 Context window: 4096
ai-chatbot-starter@0.1.0 start next start
ready - started server on 0.0.0.0:3000, url: http://localhost:3000`
Same issue here.
It seems to run just fine on my UnRAID Setup (Cuda)
I don't have a solution, but I was able to figure out why I couldn't get it working on Unraid. My Unraid server is an older R720XD and the processors in it don't support AVX2. Just to verify I loaded it up on my Proxmox host (well into the VM running docker) which does support AVX2 and I was in.
If you aren't sure if your processor supports it (and don't feel like Googling it), pop open a console (or ssh in) and run:
grep -o 'avx[^ ]*' /proc/cpuinfo
That being said, I asked "Are you available", and it took 15 minutes to respond. I tried a harder query (asking it to write a powershell script to download a file using BITS), it took about an hour and a half and presented me with a python script to do it lol. That being said, it totally worked and would have ran even better if I wasn't trying to run it on ancient enterprise hardware!
I'll revisit it once I can find a GPU to throw in that server.
Hi, I have a 1050 TI and a AMD Phenom™ II X6 1090T along side 16GB of ram Im having issues with the docker not starting (logs are below). I have Nvidia drivers setup and working and Im on version 535.104.05
09/12/2023 7:34:47 PM 09/12/2023 7:34:47 PM
09/12/2023 7:34:47 PM == CUDA == 09/12/2023 7:34:47 PM
09/12/2023 7:34:47 PM 09/12/2023 7:34:47 PM CUDA Version 12.2.0 09/12/2023 7:34:47 PM 09/12/2023 7:34:47 PM Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 09/12/2023 7:34:47 PM 09/12/2023 7:34:47 PM This container image and its contents are governed by the NVIDIA Deep Learning Container License. 09/12/2023 7:34:47 PM By pulling and using the container, you accept the terms and conditions of this license: 09/12/2023 7:34:47 PM https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license 09/12/2023 7:34:47 PM 09/12/2023 7:34:47 PM A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. 09/12/2023 7:34:47 PM 09/12/2023 7:34:47 PM /models/llama-2-7b-chat.bin model found. 09/12/2023 7:34:47 PM Initializing server with: 09/12/2023 7:34:47 PM Batch size: 2096 09/12/2023 7:34:47 PM Number of CPU threads: 6 09/12/2023 7:34:47 PM Context window: 4096 Container stopped