abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.21k stars 978 forks source link

ERROR: .GGML_ASSERT: D:\a\llama-cpp-python-cuBLAS-wheels\llama-cpp-python-cuBLAS-wheels\vendor\llama.cpp\ggml-cuda.cu:5925: false #812

Open JeisonJimenezA opened 1 year ago

JeisonJimenezA commented 1 year ago

When loading the model I get the following error message:

llm_load_tensors: ggml ctx size = 0.16 MB llm_load_tensors: using CUDA for GPU acceleration llm_load_tensors: mem required = 5734.11 MB llm_load_tensors: offloading 20 repeating layers to GPU llm_load_tensors: offloaded 20/43 layers to GPU llm_load_tensors: VRAM used: 5266.66 MB ................................................GGML_ASSERT: D:\a-cpp-python-cuBLAS-wheels-wheels-cpp-python-cuBLAS-wheels.cpp-cuBLAS-wheels.cpp-cuBLAS-cuda.cu:5925: false

my llama-cpp version is: llama_cpp_python 0.2.11+cu117

An-nym-us commented 7 months ago

I too am getting this error:

ggml_cuda_init: found 3 CUDA devices: Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes Device 1: Tesla P40, compute capability 6.1, VMM: no Device 2: Tesla P40, compute capability 6.1, VMM: no llm_load_tensors: ggml ctx size = 1.43 MiB GGML_ASSERT: D:\a\llama-cpp-python-cuBLAS-wheels\llama-cpp-python-cuBLAS-wheels\vendor\llama.cpp\ggml-cuda.cu:727: tensor->view_src == nullptr