LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.36k stars 312 forks source link

Can make it work with PHI-3-MINI and opencl #970

Open 0wwafa opened 4 days ago

0wwafa commented 4 days ago

https://huggingface.co/ZeroWw/Phi-3-mini-128k-instruct-GGUF/blob/main/Phi-3-mini-128k-instruct.q5_k.gguf

llm_load_tensors: ggml ctx size =    0.24 MiB
llm_load_tensors: offloading 4 repeating layers to GPU
llm_load_tensors: offloaded 4/33 layers to GPU
llm_load_tensors:        CPU buffer size =  2918.26 MiB
llm_load_tensors:     OpenCL buffer size =   324.19 MiB
......................................................................................
Automatic RoPE Scaling: Using (scale:1.000, base:10000.0).
llama_new_context_with_model: n_ctx      = 8288
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =  3108.00 MiB
llama_new_context_with_model: KV self size  = 3108.00 MiB, K (f16): 1554.00 MiB, V (f16): 1554.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.12 MiB
llama_new_context_with_model:        CPU compute buffer size =   570.19 MiB
llama_new_context_with_model: graph nodes  = 1286
llama_new_context_with_model: graph splits = 1
Traceback (most recent call last):
  File "koboldcpp.py", line 3783, in <module>
  File "koboldcpp.py", line 3445, in main
  File "koboldcpp.py", line 444, in load_model
OSError: exception: access violation reading 0x000000000510D000
[14532] Failed to execute script 'koboldcpp' due to unhandled exception!

but if I don't use opencl it works.