edgar971 / open-chat

A self-hosted, offline, ChatGPT-like chatbot with different LLM support. 100% private, with no data leaving your device.
MIT License
66 stars 8 forks source link

Docker not starting (Unraid) #9

Open Offlinedsad opened 1 year ago

Offlinedsad commented 1 year ago

Hi, I have a 1050 TI and a AMD Phenom™ II X6 1090T along side 16GB of ram Im having issues with the docker not starting (logs are below). I have Nvidia drivers setup and working and Im on version 535.104.05

09/12/2023 7:34:47 PM 09/12/2023 7:34:47 PM

09/12/2023 7:34:47 PM == CUDA == 09/12/2023 7:34:47 PM

09/12/2023 7:34:47 PM 09/12/2023 7:34:47 PM CUDA Version 12.2.0 09/12/2023 7:34:47 PM 09/12/2023 7:34:47 PM Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 09/12/2023 7:34:47 PM 09/12/2023 7:34:47 PM This container image and its contents are governed by the NVIDIA Deep Learning Container License. 09/12/2023 7:34:47 PM By pulling and using the container, you accept the terms and conditions of this license: 09/12/2023 7:34:47 PM https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license 09/12/2023 7:34:47 PM 09/12/2023 7:34:47 PM A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. 09/12/2023 7:34:47 PM 09/12/2023 7:34:47 PM /models/llama-2-7b-chat.bin model found. 09/12/2023 7:34:47 PM Initializing server with: 09/12/2023 7:34:47 PM Batch size: 2096 09/12/2023 7:34:47 PM Number of CPU threads: 6 09/12/2023 7:34:47 PM Context window: 4096 Container stopped

edgar971 commented 1 year ago

Try lowering the n_layers in the settings.

akshun-j commented 1 year ago

Same here. Lowered n_layers all the way down to 1 and still nothing. Log is not showing any errors. Driver 535.54.03

UPDATE: just tried the CPU version. Same result. Not working

anethema commented 1 year ago

@edgar971 Ya same here. Other things working with the nvidia driver fine, but your container won't start at all. No errors in the log. The log doesn't even update when trying to start the container.

All I get is:

==========
== CUDA ==
==========

CUDA Version 12.2.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

/models/llama-2-7b-chat.bin model found.
Initializing server with:
Batch size: 2096
Number of CPU threads: 48
Context window: 4096

The thing spins for a second in unraid then goes back to stopped. Tried dropping layers to 1 etc. No luck.

ITGuyLevi commented 1 year ago

Similar issue here with the CPU version, appears to start, logs indicate that it's running but the docker shuts down. If the model doesn't exist it downloads it fresh then ends in the same state.

You may be able to resolve this warning by settingmodel_config['protectednamespaces'] = ('settings',)`. warnings.warn( llama.cpp: loading model from /models/llama-2-7b-chat.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 4096 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_head_kv = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: n_gqa = 1 llama_model_load_internal: rnorm_eps = 5.0e-06 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: freq_base = 10000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 0.08 MB /models/llama-2-7b-chat.bin model found. Initializing server with: Batch size: 2096 Number of CPU threads: 40 Context window: 4096

ai-chatbot-starter@0.1.0 start next start

ready - started server on 0.0.0.0:3000, url: http://localhost:3000 /models/llama-2-7b-chat.bin model found. Initializing server with: Batch size: 2096 Number of CPU threads: 40 Context window: 4096

ai-chatbot-starter@0.1.0 start next start

ready - started server on 0.0.0.0:3000, url: http://localhost:3000`

voioo commented 1 year ago

Same issue here.

LunarstarPony commented 1 year ago

It seems to run just fine on my UnRAID Setup (Cuda)

ITGuyLevi commented 1 year ago

I don't have a solution, but I was able to figure out why I couldn't get it working on Unraid. My Unraid server is an older R720XD and the processors in it don't support AVX2. Just to verify I loaded it up on my Proxmox host (well into the VM running docker) which does support AVX2 and I was in.

If you aren't sure if your processor supports it (and don't feel like Googling it), pop open a console (or ssh in) and run:

grep -o 'avx[^ ]*' /proc/cpuinfo

That being said, I asked "Are you available", and it took 15 minutes to respond. I tried a harder query (asking it to write a powershell script to download a file using BITS), it took about an hour and a half and presented me with a python script to do it lol. That being said, it totally worked and would have ran even better if I wasn't trying to run it on ancient enterprise hardware!

I'll revisit it once I can find a GPU to throw in that server.