LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
5.14k stars 353 forks source link

Error while going any amount of tokens above 2048 on capable models #347

Closed paryska99 closed 1 year ago

paryska99 commented 1 year ago

`Processing Prompt [BLAS] (2130 / 2130 tokens)ggml_opencl: ggml_cl_h2d_tensor_2d(queue, d_Q, 0, src0, i03, i02, events.data() + ev_idx++) error -30 at ggml-opencl.cpp:1708 You may be out of VRAM. Please check if you have enough.

C:\Language_Model_Alpaca\koboldcpp>pause Press any key to continue . . .`

Tried models: -LLongMA-2-7B-GGML -openassistant-llama2-13b-orca-8k-3319.ggmlv3.q3_K_M.GGML Windows 10 Koboldcpp1.37 Argument: --useclblast 0 0 --unbantokens --blasbatchsize 512 --threads 9 --launch --gpulayers 14 --usemlock --ropeconfig 0.5 10000

When i tried it before with earlier koboldcpp versions (1.36) it also didn't work. Everything is fine up until at least 1 token over 2048.

Have anybody had similar problems?

paryska99 commented 1 year ago

Nevermind, missing argument --contextsize 4096.