LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.81k stars 343 forks source link

Failed to execute script 'koboldcpp' due to unhandled exception! #296

Closed Enferlain closed 1 year ago

Enferlain commented 1 year ago
D:\textgen\kobold>.\koboldcpp.exe --useclblast 0 0 --smartcontext --threads 16 --blasthreads 24 --stream --gpulayers 43 --contextsize 4096 --unbantokens
Welcome to KoboldCpp - Version 1.33
For command line arguments, please refer to --help
Otherwise, please manually select ggml file:
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.
Initializing dynamic library: koboldcpp_clblast.dll
==========
Loading model: D:\textgen\models\13b\chronos-hermes-13b.ggmlv3.q4_1.bin
[Threads: 16, BlasThreads: 24, SmartContext: True]

---
Identified as LLAMA model: (ver 5)
Attempting to Load...
---
System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
llama.cpp: loading model from D:\textgen\models\13b\chronos-hermes-13b.ggmlv3.q4_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 4096
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 3 (mostly Q4_1)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB

Platform:0 Device:0  - AMD Accelerated Parallel Processing with gfx1030

ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing'
ggml_opencl: selecting device: 'gfx1030'
ggml_opencl: device FP16 support: true
CL FP16 temporarily disabled pending further optimization.
llama_model_load_internal: using OpenCL for GPU acceleration
llama_model_load_internal: mem required  = 3267.41 MB (+ 1608.00 MB per state)
llama_model_load_internal: offloading 40 repeating layers to GPU
llama_model_load_internal: offloading non-repeating layers to GPU
llama_model_load_internal: offloading v cache to GPU
llama_model_load_internal: offloading k cache to GPU
llama_model_load_internal: offloaded 43/43 layers to GPU
llama_model_load_internal: total VRAM used: 9173 MB
llama_new_context_with_model: kv self size  = 3200.00 MB
Load Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold HTTP Server on port 5001
Please connect to custom endpoint at http://localhost:5001
Traceback (most recent call last):
  File "koboldcpp.py", line 881, in <module>
  File "koboldcpp.py", line 839, in main
  File "koboldcpp.py", line 528, in RunServerMultiThreaded
OSError: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions
[47468] Failed to execute script 'koboldcpp' due to unhandled exception!

I was running this exact line yday and before. Any idea what the error could be? Tried searching past issues but didn't find anything.

Enferlain commented 1 year ago

Meh, port was being used by something somehow, had to terminate it and it works now