Open 0wwafa opened 4 days ago
https://huggingface.co/ZeroWw/Phi-3-mini-128k-instruct-GGUF/blob/main/Phi-3-mini-128k-instruct.q5_k.gguf
llm_load_tensors: ggml ctx size = 0.24 MiB llm_load_tensors: offloading 4 repeating layers to GPU llm_load_tensors: offloaded 4/33 layers to GPU llm_load_tensors: CPU buffer size = 2918.26 MiB llm_load_tensors: OpenCL buffer size = 324.19 MiB ...................................................................................... Automatic RoPE Scaling: Using (scale:1.000, base:10000.0). llama_new_context_with_model: n_ctx = 8288 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 3108.00 MiB llama_new_context_with_model: KV self size = 3108.00 MiB, K (f16): 1554.00 MiB, V (f16): 1554.00 MiB llama_new_context_with_model: CPU output buffer size = 0.12 MiB llama_new_context_with_model: CPU compute buffer size = 570.19 MiB llama_new_context_with_model: graph nodes = 1286 llama_new_context_with_model: graph splits = 1 Traceback (most recent call last): File "koboldcpp.py", line 3783, in <module> File "koboldcpp.py", line 3445, in main File "koboldcpp.py", line 444, in load_model OSError: exception: access violation reading 0x000000000510D000 [14532] Failed to execute script 'koboldcpp' due to unhandled exception!
but if I don't use opencl it works.
https://huggingface.co/ZeroWw/Phi-3-mini-128k-instruct-GGUF/blob/main/Phi-3-mini-128k-instruct.q5_k.gguf
but if I don't use opencl it works.