bartowski1182 commented 1 year ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[x] I carefully followed the README.md.
[x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Loading a 7B q2_K model should easily fit all layers into my 3060 with 12GB of VRAM

Current Behavior

Even loading with only 20 layers of 7B q2_K offloaded results in an error saying i'm out of VRAM

Environment and Context

Ryzen 3600 64GB ram, 3060 12GB RAM

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

Load with parameters --model vicuna-7b-v1.3.ggmlv3.q2_K.bin --threads 1 --useclblast 0 0 --gpulayers 20
Check logs as it attempts to load and fins may be out of VRAM

Failure Logs

koboldcpp    | llama.cpp: loading model from /app/models/vicuna-7b-v1.3.ggmlv3.q2_K.bin
koboldcpp    | llama_model_load_internal: format     = ggjt v3 (latest)
koboldcpp    | llama_model_load_internal: n_vocab    = 32000
koboldcpp    | llama_model_load_internal: n_ctx      = 2048
koboldcpp    | llama_model_load_internal: n_embd     = 4096
koboldcpp    | llama_model_load_internal: n_mult     = 256
koboldcpp    | llama_model_load_internal: n_head     = 32
koboldcpp    | llama_model_load_internal: n_layer    = 32
koboldcpp    | llama_model_load_internal: n_rot      = 128
koboldcpp    | llama_model_load_internal: ftype      = 10 (mostly Q2_K)
koboldcpp    | llama_model_load_internal: n_ff       = 11008
koboldcpp    | llama_model_load_internal: n_parts    = 1
koboldcpp    | llama_model_load_internal: model size = 7B
koboldcpp    | llama_model_load_internal: ggml ctx size =    0.07 MB
koboldcpp    | ggml_opencl: clGetPlatformIDs(NPLAT, platform_ids, &n_platforms) error -1001 at ggml-opencl.cpp:787
koboldcpp    | You may be out of VRAM. Please check if you have enough.
koboldcpp    |
koboldcpp    | ---
koboldcpp    | Identified as LLAMA model: (ver 5)
koboldcpp    | Attempting to Load...
koboldcpp    | ---
koboldcpp    | System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |

LostRuins commented 1 year ago

Are you running the exe? Building from source (did you run make clean)? Windows or linux?

bartowski1182 commented 1 year ago

Good questions, building from source, clean make, tried with both make LLAMA_CUBLAS=1 and make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1, Ubuntu 22.04

It's not even attempting to load anything onto the GPU, if i watch it with nvidia-smi it stays at 1mb

bartowski1182 commented 1 year ago

sigh ignore me, i was using the wrong image.. spent a solid 30 minutes debugging a typo, love it

LostRuins / koboldcpp

[Issue] Loading any model on 1.32 crashes saying I'm out of VRAM #255

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Failure Logs