Error while going any amount of tokens above 2048 on capable models

`Processing Prompt [BLAS] (2130 / 2130 tokens)ggml_opencl: ggml_cl_h2d_tensor_2d(queue, d_Q, 0, src0, i03, i02, events.data() + ev_idx++) error -30 at ggml-opencl.cpp:1708 You may be out of VRAM. Please check if you have enough.

C:\Language_Model_Alpaca\koboldcpp>pause Press any key to continue . . .`

Tried models: -LLongMA-2-7B-GGML -openassistant-llama2-13b-orca-8k-3319.ggmlv3.q3_K_M.GGML Windows 10 Koboldcpp1.37 Argument: --useclblast 0 0 --unbantokens --blasbatchsize 512 --threads 9 --launch --gpulayers 14 --usemlock --ropeconfig 0.5 10000

When i tried it before with earlier koboldcpp versions (1.36) it also didn't work. Everything is fine up until at least 1 token over 2048.

Have anybody had similar problems?

LostRuins / koboldcpp

Error while going any amount of tokens above 2048 on capable models #347