getumbrel / llama-gpt

A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!
https://apps.umbrel.com/app/llama-gpt
MIT License
10.82k stars 699 forks source link

Stuck on '[Host [llama-gpt-api-cuda-ggml:8000] not yet available...' #86

Closed Proryanator closed 1 year ago

Proryanator commented 1 year ago

My system has an i5-8400 and a GTX 1660 Super, and I'm running using WSL2 && Windows 10. I've also ran into this issue running on an Intel mac as well. I'm getting the following message infinitely when running with either --with-cuda or not:

issue

I thought it may have something to do with my pihole instance managing DNS things, but switching to my normal router I still get this error for what seems like forever.

I tried changing the port to 8001, or changing the hostname to localhost directly but I get the same thing. I also verified that nothing is running on port 8000 on my PC or Mac.

To avoid permission issues I've been running sudo ./run.sh --model 7b --with-cuda as well, so no errors about storing models or whatnot.

Thanks!

BeachUnicorn commented 1 year ago

I got the same issue, but with ./run.sh --model 70b Running Windows 10 as well.. but should it matter when its containerized? I did let it run all night but still get "not yet available".

./run.sh --model code-13b is the only one that works for me so far.

With ./run.sh --model code-34b I get all the way to the web site, but the form is missing and I can't chat.

vient41 commented 1 year ago

Same issue running on a ubuntu VM. with ./run.sh --model 7b

thomas@template:~/llama-gpt$ sudo ./run.sh --model 7b [+] Building 1.1s (20/20) FINISHED => [llama-gpt-ui internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 859B 0.0s => [llama-gpt-ui internal] load .dockerignore 0.0s => => transferring context: 82B 0.0s => [llama-gpt-ui internal] load metadata for ghcr.io/ufoscout/docker-compose-wait:latest 0.5s => [llama-gpt-ui internal] load metadata for docker.io/library/node:19-alpine 1.0s => [llama-gpt-ui] FROM ghcr.io/ufoscout/docker-compose-wait:latest@sha256:c8f918b9ce47a74e90813c94f2078a086ff4b4023ba282802a0a9a27dcc0383a 0.0s => [llama-gpt-ui base 1/3] FROM docker.io/library/node:19-alpine@sha256:8ec543d4795e2e85af924a24f8acb039792ae9fe8a42ad5b4bf4c277ab34b62e 0.0s => [llama-gpt-ui internal] load build context 0.0s => => transferring context: 13.73kB 0.0s => CACHED [llama-gpt-ui base 2/3] WORKDIR /app 0.0s => CACHED [llama-gpt-ui base 3/3] COPY package.json ./ 0.0s => CACHED [llama-gpt-ui dependencies 1/1] RUN npm ci 0.0s => CACHED [llama-gpt-ui production 3/9] COPY --from=dependencies /app/node_modules ./node_modules 0.0s => CACHED [llama-gpt-ui build 1/2] COPY . . 0.0s => CACHED [llama-gpt-ui build 2/2] RUN npm run build 0.0s => CACHED [llama-gpt-ui production 4/9] COPY --from=build /app/.next ./.next 0.0s => CACHED [llama-gpt-ui production 5/9] COPY --from=build /app/public ./public 0.0s => CACHED [llama-gpt-ui production 6/9] COPY --from=build /app/package.json ./ 0.0s => CACHED [llama-gpt-ui production 7/9] COPY --from=build /app/next.config.js ./next.config.js 0.0s => CACHED [llama-gpt-ui production 8/9] COPY --from=build /app/next-i18next.config.js ./next-i18next.config.js 0.0s => CACHED [llama-gpt-ui production 9/9] COPY --from=ghcr.io/ufoscout/docker-compose-wait:latest /wait /wait 0.0s => [llama-gpt-ui] exporting to image 0.0s => => exporting layers 0.0s => => writing image sha256:1d6b8bc4e651e7c9f336c7946a78d61da2b3f93a89454fc5c19ad00766e51766 0.0s => => naming to docker.io/library/llama-gpt-llama-gpt-ui 0.0s [+] Running 2/0 ✔ Container llama-gpt-llama-gpt-ui-1 Created 0.0s ✔ Container llama-gpt-llama-gpt-api-1 Running 0.0s Attaching to llama-gpt-llama-gpt-api-1, llama-gpt-llama-gpt-ui-1 llama-gpt-llama-gpt-ui-1 | [INFO wait] -------------------------------------------------------- llama-gpt-llama-gpt-ui-1 | [INFO wait] docker-compose-wait 2.12.0 llama-gpt-llama-gpt-ui-1 | [INFO wait] --------------------------- llama-gpt-llama-gpt-ui-1 | [DEBUG wait] Starting with configuration: llama-gpt-llama-gpt-ui-1 | [DEBUG wait] - Hosts to be waiting for: [llama-gpt-api:8000] llama-gpt-llama-gpt-ui-1 | [DEBUG wait] - Paths to be waiting for: [] llama-gpt-llama-gpt-ui-1 | [DEBUG wait] - Timeout before failure: 3600 seconds llama-gpt-llama-gpt-ui-1 | [DEBUG wait] - TCP connection timeout before retry: 5 seconds llama-gpt-llama-gpt-ui-1 | [DEBUG wait] - Sleeping time before checking for hosts/paths availability: 0 seconds llama-gpt-llama-gpt-ui-1 | [DEBUG wait] - Sleeping time once all hosts/paths are available: 0 seconds llama-gpt-llama-gpt-ui-1 | [DEBUG wait] - Sleeping time between retries: 1 seconds llama-gpt-llama-gpt-ui-1 | [DEBUG wait] -------------------------------------------------------- llama-gpt-llama-gpt-ui-1 | [INFO wait] Checking availability of host [llama-gpt-api:8000] llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available... llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available... llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available... llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available... llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available... llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available... llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available... llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available... llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available... llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available... llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available...

Proryanator commented 1 year ago

So I did try what @BeachUnicorn tried, with running ./run.sh --model code-13b and I did get to localhost. However I get an internal error when entering in a prompt.

I noticed when running the 7b model my RAM continues to climb while that warning is there.

Appears to work as expected if you wait 😅

barshag commented 1 year ago

For me (inside of g5.xlarge on aws) - got in loop with the message between of: llama-gpt-llama-gpt-api-cuda-gguf-1 exited with code 137

HaloFan62 commented 1 year ago

Is Llama AI dead? Getting the same issue now that I've reinstalled. Been waiting almost 30 mnutes now, ah nevermind back to assertion error. Square one, great, time to uninstall all this lmao. Hopefully its all fixed soon.

AkaP88 commented 1 year ago

I encountered the same issue (Ubuntu / WSL 2 with Cuda) but works great after 15 mn. Restarting the docker requiers another 15.

AmrMonier commented 1 year ago

did any of you guys managed to figure this out, i'm still unable to resolve this issue and i'm getting the same



```exited with code 137
llama-gpt-llama-gpt-api-1  | /usr/local/lib/python3.11/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
llama-gpt-llama-gpt-api-1  | !!
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-1  |         Please avoid running ``setup.py`` and ``easy_install``.
llama-gpt-llama-gpt-api-1  |         Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-1  |         standards-based tools.
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  |         See https://github.com/pypa/setuptools/issues/917 for details.
llama-gpt-llama-gpt-api-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | !!
llama-gpt-llama-gpt-api-1  |   easy_install.initialize_options(self)
llama-gpt-llama-gpt-api-1  | [0/1] Install the project...
llama-gpt-llama-gpt-api-1  | -- Install configuration: "Release"
llama-gpt-llama-gpt-api-1  | -- Up-to-date: /app/_skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so
llama-gpt-llama-gpt-api-1  | copying _skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so -> llama_cpp/libllama.so
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | running develop
llama-gpt-llama-gpt-api-1  | /usr/local/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
llama-gpt-llama-gpt-api-1  | !!
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-1  |         Please avoid running ``setup.py`` directly.
llama-gpt-llama-gpt-api-1  |         Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-1  |         standards-based tools.
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  |         See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
llama-gpt-llama-gpt-api-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | !!
llama-gpt-llama-gpt-api-1  |   self.initialize_options()
llama-gpt-llama-gpt-api-1  | running egg_info
llama-gpt-llama-gpt-api-1  | writing llama_cpp_python.egg-info/PKG-INFO
llama-gpt-llama-gpt-api-1  | writing dependency_links to llama_cpp_python.egg-info/dependency_links.txt
llama-gpt-llama-gpt-api-1  | writing requirements to llama_cpp_python.egg-info/requires.txt
llama-gpt-llama-gpt-api-1  | writing top-level names to llama_cpp_python.egg-info/top_level.txt
llama-gpt-llama-gpt-api-1  | reading manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-1  | adding license file 'LICENSE.md'
llama-gpt-llama-gpt-api-1  | writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-1  | running build_ext
llama-gpt-llama-gpt-api-1  | Creating /usr/local/lib/python3.11/site-packages/llama-cpp-python.egg-link (link to .)
llama-gpt-llama-gpt-api-1  | llama-cpp-python 0.1.78 is already the active version in easy-install.pth
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | Installed /app
llama-gpt-llama-gpt-api-1  | Processing dependencies for llama-cpp-python==0.1.78
llama-gpt-llama-gpt-api-1  | Searching for diskcache==5.6.1
llama-gpt-llama-gpt-api-1  | Best match: diskcache 5.6.1
llama-gpt-llama-gpt-api-1  | Processing diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-1  | Adding diskcache 5.6.1 to easy-install.pth file
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | Using /usr/local/lib/python3.11/site-packages/diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-1  | Searching for numpy==1.26.0b1
llama-gpt-llama-gpt-api-1  | Best match: numpy 1.26.0b1
llama-gpt-llama-gpt-api-1  | Processing numpy-1.26.0b1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-1  | Adding numpy 1.26.0b1 to easy-install.pth file
llama-gpt-llama-gpt-api-1  | Installing f2py script to /usr/local/bin
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | Using /usr/local/lib/python3.11/site-packages/numpy-1.26.0b1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-1  | Searching for typing-extensions==4.7.1
llama-gpt-llama-gpt-api-1  | Best match: typing-extensions 4.7.1
llama-gpt-llama-gpt-api-1  | Adding typing-extensions 4.7.1 to easy-install.pth file
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | Using /usr/local/lib/python3.11/site-packages
llama-gpt-llama-gpt-api-1  | Finished processing dependencies for llama-cpp-python==0.1.78
llama-gpt-llama-gpt-api-1  | Initializing server with:
llama-gpt-llama-gpt-api-1  | Batch size: 1024
llama-gpt-llama-gpt-api-1  | Number of CPU threads: 8
llama-gpt-llama-gpt-api-1  | Number of GPU layers: 0
llama-gpt-llama-gpt-api-1  | Context window: 4096
llama-gpt-llama-gpt-ui-1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-llama-gpt-ui-1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-llama-gpt-api-1  | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:127: UserWarning: Field "model_alias" has conflict with protected namespace "model_".
Nisse123 commented 1 year ago

I think the code base is broken right now, someone has to fix this issue https://github.com/abetlen/llama-cpp-python/issues/520

peterhoila commented 1 year ago

I had the same issue with all of the models, even after letting them run for a while

GreatNewHope commented 1 year ago

In case it helps. I have 32GB of RAM memory, Linux machine, using --with-cuda:

Llama 2 7b Runs perfectly

Llama 2 13b Runs perfectly

Llama 2 70b Stuck on Host not yet available. Checking my ram usage, I can see that LlamaGPT is trying to use more than my 32 GB of ram. It appears to be stuck on a loop of allocating memory, not having enough memory, liberating the used memory and the trying to allocate it again. This might explain some of the errors some users are experiencing.

With CodeLlama, it gets stuck on every model, but it appears that LlamaGPT is not even trying to allocate any memory.

dakshesh14 commented 1 year ago

Hi, this happened to me as well. Assuming the output told me to wait [INFO wait], I just waited.

llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available...

I also checked the folder models/ against the file size shown here. If I saw some progress I assumed everything was fine. After some time port 8000 was available. So, I think the key is to wait 😅. Also, monitoring resources and progress can be helpful.

Proryanator commented 1 year ago

Yeup can confirm it's just a waiting game. Going to go ahead and close this out.

teto commented 1 year ago

I ran the same commands on 2 different computers, it worked fine on the first but failed on the 2nd with:

llama-gpt-llama-gpt-api-1  | llama_model_load_internal: ggml ctx size =    0.01 MB
llama-gpt-llama-gpt-api-1  | error loading model: llama.cpp: tensor 'layers.1.ffn_norm.weight' is missing from model
llama-gpt-llama-gpt-api-1  | llama_load_model_from_file: failed to load model
llama-gpt-llama-gpt-api-1  | Traceback (most recent call last):
llama-gpt-llama-gpt-api-1  |   File "<frozen runpy>", line 198, in _run_module_as_main
llama-gpt-llama-gpt-api-1  |   File "<frozen runpy>", line 88, in _run_code
llama-gpt-llama-gpt-api-1  |   File "/app/llama_cpp/server/__main__.py", line 46, in <module>
llama-gpt-llama-gpt-api-1  |     app = create_app(settings=settings)
llama-gpt-llama-gpt-api-1  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-1  |   File "/app/llama_cpp/server/app.py", line 317, in create_app
llama-gpt-llama-gpt-api-1  |     llama = llama_cpp.Llama(
llama-gpt-llama-gpt-api-1  |             ^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-1  |   File "/app/llama_cpp/llama.py", line 328, in __init__
llama-gpt-llama-gpt-api-1  |     assert self.model is not None
llama-gpt-llama-gpt-api-1  |            ^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-1  | AssertionError
llama-gpt-llama-gpt-api-1 exited with code 1
llama-gpt-llama-gpt-ui-1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-llama-gpt-ui-1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-llama-gpt-ui-1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...

I checked the files in the models/ folder and on 2nd computer, the file was much smaller and indeed I had a network failure around that time. Could be useful to check the downloaded models against some hash.

sss1337xyz commented 8 months ago

I checked the files in the models/ folder

Where is that folder placed?

The folder is located in the folder of your cloned repository

ProgrammingLife commented 6 months ago

Phind-34b model doesn't even start also. It hangs on lines:

$ ./run.sh --model code-34b --with-cuda
...
llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-gguf:8000] not yet available...
llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-gguf:8000] not yet available...
llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-gguf:8000] not yet available...

$ sudo netstat -tulpn | grep :8000
$ sudo netstat -tulpn | grep :3000
$

7b and 13b works perfectly.

3080ti (16gb), 32gb RAM, 7gb RAM allocated when stucks on those lines Arch Linux

shareef-dweikat commented 5 months ago

Anyone found a solution? I have same problem on my MAC pro M1

llama-gpt-llama-gpt-api-1 | File "<frozen runpy>", line 198, in _run_module_as_main llama-gpt-llama-gpt-api-1 | File "<frozen runpy>", line 88, in _run_code llama-gpt-llama-gpt-api-1 | File "/app/llama_cpp/server/__main__.py", line 46, in <module> llama-gpt-llama-gpt-api-1 | app = create_app(settings=settings) llama-gpt-llama-gpt-api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ llama-gpt-llama-gpt-api-1 | File "/app/llama_cpp/server/app.py", line 317, in create_app llama-gpt-llama-gpt-api-1 | llama = llama_cpp.Llama( llama-gpt-llama-gpt-api-1 | ^^^^^^^^^^^^^^^^ llama-gpt-llama-gpt-api-1 | File "/app/llama_cpp/llama.py", line 328, in __init__ llama-gpt-llama-gpt-api-1 | assert self.model is not None llama-gpt-llama-gpt-api-1 | ^^^^^^^^^^^^^^^^^^^^^^ llama-gpt-llama-gpt-api-1 | AssertionError llama-gpt-llama-gpt-api-1 exited with code 1 llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available...