Pipboyguy commented 1 year ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[x] I carefully followed the README.md.
[x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

There's no tagged cuda image on ghcr so after buildting the Dockerfile.cuda image,

docker run --gpus=all --rm -it -p 8000:8000 -v /home/***/models:/models -e MODEL=/models/GPT4-X-Alpasta-30b_q4_0.bin llama_cpp_server_cuda

Current Behavior

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

llama.cpp: loading model from /models/GPT4-X-Alpasta-30b_q4_0.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32016
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 135.75 KB
llama_model_load_internal: mem required  = 21695.61 MB (+ 3124.00 MB per state)
WARNING: failed to allocate 0.13 MB of pinned memory: forward compatibility was attempted on non supported HW
CUDA error 804 at ggml-cuda.cu:405: forward compatibility was attempted on non supported HW

abetlen commented 1 year ago

@Pipboyguy a couple things to test, is what version of CUDA do you have installed on your base os? Also, this may sound stupid but most bugs with CUDA often are, have you tried rebooting since the most recent CUDA installation?

Pipboyguy commented 1 year ago

llama_cpp.server works perfectly outside of docker with virtualenv so I believe this is isolated to the nvidia-docker setup.

Here's some info about my host system:

Driver Version: 525.105.17
CUDA Version: 12.0
OS: Ubuntu 22.04

Pipboyguy commented 1 year ago

Please see PR https://github.com/abetlen/llama-cpp-python/pull/235. Eliminates issue

Pipboyguy commented 1 year ago

This only eliminates it for me so would be nice to get more testers

gjmulder commented 1 year ago

Exactly what NVidia GPU are you having issues with? 12.1 seems to support my ancient GTX 1080Ti (Pascal architecture). I have a 980Ti (Maxwell) somewhere, but I'd have to plug it in and hope it still works.

CUDA Compatibility Matrix

Pipboyguy commented 1 year ago

Exactly what NVidia GPU are you having issues with? 12.1 seems to support my ancient GTX 1080Ti (Pascal architecture). I have a 980Ti (Maxwell) somewhere, but I'd have to plug it in and hope it still works.

CUDA Compatibility Matrix

Running a RTX 4080. Outdated drivers perhaps?

gjmulder commented 1 year ago

FROM nvidia/cuda:12.1.1-devel-ubuntu20.04

is working with my 3090Ti.

My Ubuntu Docker host:

$ uname -a
Linux asushimu 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/issue
Ubuntu 22.04.2 LTS \n \l

$ dpkg -l | grep "^ii.*nvidia-driver"
ii  nvidia-driver-530                      530.30.02-0ubuntu1                      amd64        NVIDIA driver metapackage

Pipboyguy commented 1 year ago

My Pop!_OS host:

$ uname -a
Linux workstation 6.2.6-76060206-generic #202303130630~1683753207~22.04~77c1465 SMP PREEMPT_DYNAMIC Wed M x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/issue
Pop!_OS 22.04 LTS

$ dpkg -l | grep "^ii.*nvidia-driver"
ii  nvidia-driver-525                       525.105.17-1pop0~1681323337~22.04~22e0810                                       amd64        NVIDIA driver metapackage

gjmulder commented 1 year ago

Should be addressed in #258.

Pipboyguy commented 1 year ago

shall we close this issue?

d0rc commented 1 year ago

Just got it on 4090 with last master:

Commit: 2d7bf110edd8c49209401a16132052cba706ffd0 Built with: make LLAMA_CUBLAS=1

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

uname -a
Linux ddf7cfbdde40 5.15.0-73-generic #80~20.04.1-Ubuntu SMP Wed May 17 14:58:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

llama.cpp/main -m ./guanaco-65B.ggmlv3.q4_0.bin -p "hello!"
main: build = 631 (2d7bf11)
main: seed  = 1686104573
CUDA error 804 at ggml-cuda.cu:1039: forward compatibility was attempted on non supported HW

gjmulder commented 1 year ago

Are you running in a virtualized environment and trying to access your NVidia GPU?

If so, try and update your nvidia driver in your VM or Docker instance.

Pipboyguy commented 1 year ago

What worked for me was upgrading my nvidia-driver on the host, then Cuda version 12.1 should work. Also try CUDA 11.7 if upgrading nvidia driver is pain. Very likely the issue in your case as well @d0rc

gjmulder commented 1 year ago

What worked for me was upgrading my nvidia-driver on the host, then Cuda version 12.1 should work. Also try CUDA 11.7 if upgrading nvidia driver is pain. Very likely the issue in your case as well @d0rc

That makes more sense. I can see that an older driver in the VM works fine with a newer driver on the host, but not vice versa.

abetlen / llama-cpp-python

CUDA Forward Compatibility on non supported HW #234

Prerequisites

Expected Behavior

Current Behavior