LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.82k stars 343 forks source link

Error while submitting prompt: TypeError: Failed to fetch #211

Closed yesbroc closed 1 year ago

yesbroc commented 1 year ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

Milieseconds after sending my prompt it crashed and gave no reply.

Current Behavior

Exception occurred during processing of request from ('127.0.0.1', 51868) Traceback (most recent call last): File "socketserver.py", line 316, in _handle_request_noblock File "socketserver.py", line 347, in process_request File "socketserver.py", line 360, in finish_request File "koboldcpp.py", line 223, in call File "http\server.py", line 651, in init File "socketserver.py", line 747, in init File "http\server.py", line 425, in handle File "http\server.py", line 413, in handle_one_request File "koboldcpp.py", line 324, in do_POST File "koboldcpp.py", line 171, in generate TypeError: int expected instead of float

Environment and Context

Windows 11 home koboldcpp.exe Guanaco 13b GGML 16gb Ram no venv

Failure Information (for bugs)

PS C:\Users\orijp\OneDrive\Desktop\chatgpts> ./koboldcpp.exe "C:\Users\orijp\OneDrive\Desktop\chatgpts\oobabooga_windows\oobabooga_windows\text-generation-webui\models\ggml-guanaco-13B.ggmlv3.q5_1.bin" --stream --useclblast 0 0 --gpulayers 7 --threads 12 --smartcontext Welcome to KoboldCpp - Version 1.24 Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required. Initializing dynamic library: koboldcpp_clblast.dll

Loading model: C:\Users\orijp\OneDrive\Desktop\chatgpts\oobabooga_windows\oobabooga_windows\text-generation-webui\models\ggml-guanaco-13B.ggmlv3.q5_1.bin [Threads: 12, BlasThreads: 12, SmartContext: True]


Identified as LLAMA model: (ver 5) Attempting to Load...

System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | llama.cpp: loading model from C:\Users\orijp\OneDrive\Desktop\chatgpts\oobabooga_windows\oobabooga_windows\text-generation-webui\models\ggml-guanaco-13B.ggmlv3.q5_1.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 5120 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 40 llama_model_load_internal: n_layer = 40 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 9 (mostly Q5_1) llama_model_load_internal: n_ff = 13824 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 13B llama_model_load_internal: ggml ctx size = 0.09 MB llama_model_load_internal: mem required = 11359.05 MB (+ 1608.00 MB per state)

Initializing CLBlast (First Run)... Attempting to use: Platform=0, Device=0 (If invalid, program will crash) Using Platform: NVIDIA CUDA Device: NVIDIA GeForce RTX 3050 Ti Laptop GPU FP16: 0 CL FP16 temporarily disabled pending further optimization. llama_model_load_internal: [opencl] offloading 7 layers to GPU llama_model_load_internal: [opencl] total VRAM used: 1588 MB llama_init_from_file: kv self size = 1600.00 MB Load Model OK: True Embedded Kobold Lite loaded. Starting Kobold HTTP Server on port 5001 Please connect to custom endpoint at http://localhost:5001 127.0.0.1 - - [04/Jun/2023 11:26:59] "GET /?streaming=1 HTTP/1.1" 200 - 127.0.0.1 - - [04/Jun/2023 11:27:00] "GET /api/v1/model HTTP/1.1" 200 - 127.0.0.1 - - [04/Jun/2023 11:27:00] "GET /api/v1/info/version HTTP/1.1" 200 - 127.0.0.1 - - [04/Jun/2023 11:27:00] "GET /sw.js HTTP/1.1" 404 - 127.0.0.1 - - [04/Jun/2023 11:27:00] "GET /manifest.json HTTP/1.1" 404 -

Input: {"n": 1, "max_context_length": 2048, "max_length": 8, "rep_pen": 1.25, "temperature": 0.5, "top_p": 0.3, "top_k": 0.4, "top_a": 0.95, "typical": 1, "tfs": 1, "rep_pen_range": 1024, "rep_pen_slope": 0.7, "sampler_order": [0, 1, 2, 3, 4, 5, 6], "prompt": "[The following is a chat message log between you and an extremely intelligent and knowledgeable AI system named KoboldGPT. KoboldGPT is a state-of-the-art Artificial General Intelligence. You may ask any question, or request any task, and KoboldGPT will always be able to respond accurately and truthfully.]\n\nYou: What are german shepherds?\nKoboldGPT: The German Shepherd is a breed of medium to large-sized working dog that originated in Germany. In the English language, the breed's officially recognized name is German Shepherd Dog. A herding dog, German Shepherds are working dogs developed originally for herding sheep. Since that time, however, because of their strength, intelligence, trainability, and obedience, German Shepherds around the world are often the preferred breed for many types of work, including disability assistance, search-and-rescue, police and military roles and acting.\nYou: Which is heavier, a duck or a car?\nKoboldGPT: A car weighs around 1300 kilograms, while a duck only weighs about 2 kilograms. Therefore, a car is heavier than a duck.\nYou: What is the atomic number of Calcium?\nKoboldGPT: A calcium atom has 20 protons, 20 electrons and 20 neutrons. The atomic number or proton number of a chemical element is the number of protons found in the nucleus of every atom of that element. The atomic number uniquely identifies a chemical element. It is identical to the charge number of the nucleus. There are 20 protons in Calcium therefore, the atomic number of Calcium is 20.\nYou: What is 28 + 13?\nKoboldGPT: 28 + 13 is equal to 41.\nYou: Tell me a joke.\nKoboldGPT: Why did the chicken cross the road? To get to the other side!\nYou: How is an earthquake measured?\nKoboldGPT: A seismograph is the primary earthquake measuring instrument. The seismograph produces a digital graphic recording of the ground motion caused by the seismic waves. The digital recording is called seismogram. A network of worldwide seismographs detects and measures the strength and duration of the earthquake's waves. The magnitude of an earthquake and the intensity of shaking is usually reported on the Richter scale.\n\nKoboldGPT: Hello, I am KoboldGPT, your personal AI assistant. What would you like to know?\nYou: o\nYou: o\nYou: frick\nKoboldGPT:", "quiet": true, "stop_sequence": ["You:"]}

Exception occurred during processing of request from ('127.0.0.1', 51985) Traceback (most recent call last): File "socketserver.py", line 316, in _handle_request_noblock File "socketserver.py", line 347, in process_request File "socketserver.py", line 360, in finish_request File "koboldcpp.py", line 223, in call File "http\server.py", line 651, in init File "socketserver.py", line 747, in init File "http\server.py", line 425, in handle File "http\server.py", line 413, in handle_one_request File "koboldcpp.py", line 324, in do_POST File "koboldcpp.py", line 171, in generate TypeError: int expected instead of float

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

  1. Load the model, smartcontext and clblast on

  2. run the webui with kobold chat image

  3. try talking to it ig

Failure Logs

Windows PowerShell Copyright (C) Microsoft Corporation. All rights reserved.

Install the latest PowerShell for new features and improvements! https://aka.ms/PSWindows

PS C:\Users\orijp\OneDrive\Desktop\chatgpts> koboldcpp.exe "C:\Users\orijp\OneDrive\Desktop\chatgpts\oobabooga_windows\oobabooga_windows\text-generation-webui\models\ggml-guanaco-13B.ggmlv3.q5_1.bin" --stream --useclblast 0 0 --gpulayers 7 --threads 12 --smartcontext koboldcpp.exe : The term 'koboldcpp.exe' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again. At line:1 char:1

Suggestion [3,General]: The command koboldcpp.exe was not found, but does exist in the current location. Windows PowerShell does not load commands from the current location by default. If you trust this command, instead type: ".\koboldcpp.exe". See "get-help about_Command_Precedence" for more details. PS C:\Users\orijp\OneDrive\Desktop\chatgpts> ./koboldcpp.exe "C:\Users\orijp\OneDrive\Desktop\chatgpts\oobabooga_windows\oobabooga_windows\text-generation-webui\models\ggml-guanaco-13B.ggmlv3.q5_1.bin" --stream --useclblast 0 0 --gpulayers 7 --threads 12 --smartcontext Welcome to KoboldCpp - Version 1.24 Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required. Initializing dynamic library: koboldcpp_clblast.dll

Loading model: C:\Users\orijp\OneDrive\Desktop\chatgpts\oobabooga_windows\oobabooga_windows\text-generation-webui\models\ggml-guanaco-13B.ggmlv3.q5_1.bin [Threads: 12, BlasThreads: 12, SmartContext: True]


Identified as LLAMA model: (ver 5) Attempting to Load...

System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | llama.cpp: loading model from C:\Users\orijp\OneDrive\Desktop\chatgpts\oobabooga_windows\oobabooga_windows\text-generation-webui\models\ggml-guanaco-13B.ggmlv3.q5_1.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 5120 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 40 llama_model_load_internal: n_layer = 40 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 9 (mostly Q5_1) llama_model_load_internal: n_ff = 13824 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 13B llama_model_load_internal: ggml ctx size = 0.09 MB llama_model_load_internal: mem required = 11359.05 MB (+ 1608.00 MB per state)

Initializing CLBlast (First Run)... Attempting to use: Platform=0, Device=0 (If invalid, program will crash) Using Platform: NVIDIA CUDA Device: NVIDIA GeForce RTX 3050 Ti Laptop GPU FP16: 0 CL FP16 temporarily disabled pending further optimization. llama_model_load_internal: [opencl] offloading 7 layers to GPU llama_model_load_internal: [opencl] total VRAM used: 1588 MB llama_init_from_file: kv self size = 1600.00 MB Load Model OK: True Embedded Kobold Lite loaded. Starting Kobold HTTP Server on port 5001 Please connect to custom endpoint at http://localhost:5001 127.0.0.1 - - [04/Jun/2023 11:26:59] "GET /?streaming=1 HTTP/1.1" 200 - 127.0.0.1 - - [04/Jun/2023 11:27:00] "GET /api/v1/model HTTP/1.1" 200 - 127.0.0.1 - - [04/Jun/2023 11:27:00] "GET /api/v1/info/version HTTP/1.1" 200 - 127.0.0.1 - - [04/Jun/2023 11:27:00] "GET /sw.js HTTP/1.1" 404 - 127.0.0.1 - - [04/Jun/2023 11:27:00] "GET /manifest.json HTTP/1.1" 404 -

Input: {"n": 1, "max_context_length": 2048, "max_length": 8, "rep_pen": 1.25, "temperature": 0.5, "top_p": 0.3, "top_k": 0.4, "top_a": 0.95, "typical": 1, "tfs": 1, "rep_pen_range": 1024, "rep_pen_slope": 0.7, "sampler_order": [0, 1, 2, 3, 4, 5, 6], "prompt": "[The following is a chat message log between you and an extremely intelligent and knowledgeable AI system named KoboldGPT. KoboldGPT is a state-of-the-art Artificial General Intelligence. You may ask any question, or request any task, and KoboldGPT will always be able to respond accurately and truthfully.]\n\nYou: What are german shepherds?\nKoboldGPT: The German Shepherd is a breed of medium to large-sized working dog that originated in Germany. In the English language, the breed's officially recognized name is German Shepherd Dog. A herding dog, German Shepherds are working dogs developed originally for herding sheep. Since that time, however, because of their strength, intelligence, trainability, and obedience, German Shepherds around the world are often the preferred breed for many types of work, including disability assistance, search-and-rescue, police and military roles and acting.\nYou: Which is heavier, a duck or a car?\nKoboldGPT: A car weighs around 1300 kilograms, while a duck only weighs about 2 kilograms. Therefore, a car is heavier than a duck.\nYou: What is the atomic number of Calcium?\nKoboldGPT: A calcium atom has 20 protons, 20 electrons and 20 neutrons. The atomic number or proton number of a chemical element is the number of protons found in the nucleus of every atom of that element. The atomic number uniquely identifies a chemical element. It is identical to the charge number of the nucleus. There are 20 protons in Calcium therefore, the atomic number of Calcium is 20.\nYou: What is 28 + 13?\nKoboldGPT: 28 + 13 is equal to 41.\nYou: Tell me a joke.\nKoboldGPT: Why did the chicken cross the road? To get to the other side!\nYou: How is an earthquake measured?\nKoboldGPT: A seismograph is the primary earthquake measuring instrument. The seismograph produces a digital graphic recording of the ground motion caused by the seismic waves. The digital recording is called seismogram. A network of worldwide seismographs detects and measures the strength and duration of the earthquake's waves. The magnitude of an earthquake and the intensity of shaking is usually reported on the Richter scale.\n\nKoboldGPT: Hello, I am KoboldGPT, your personal AI assistant. What would you like to know?\nYou: o\nYou: o\nYou: frick\nKoboldGPT:", "quiet": true, "stop_sequence": ["You:"]}

Exception occurred during processing of request from ('127.0.0.1', 51985) Traceback (most recent call last): File "socketserver.py", line 316, in _handle_request_noblock File "socketserver.py", line 347, in process_request File "socketserver.py", line 360, in finish_request File "koboldcpp.py", line 223, in call File "http\server.py", line 651, in init File "socketserver.py", line 747, in init File "http\server.py", line 425, in handle File "http\server.py", line 413, in handle_one_request File "koboldcpp.py", line 324, in do_POST File "koboldcpp.py", line 171, in generate TypeError: int expected instead of float

LostRuins commented 1 year ago

I notice you are running an older version v1.24 of KoboldCpp.

Can you please update to the latest version v1.28 and try again? Show me the full console output log when it has an error. Thanks.

yesbroc commented 1 year ago
C:\Users\orijp\OneDrive\Desktop\chatgpts>koboldcpp.exe "C:\Users\orijp\OneDrive\Desktop\chatgpts\oobabooga_windows\oobabooga_windows\text-generation-webui\models\ggml-guanaco-13B.ggmlv3.q5_1.bin"  --stream --useclblast 0 0 --gpulayers 7 --threads 12
Welcome to KoboldCpp - Version 1.28
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.
Initializing dynamic library: koboldcpp_clblast.dll
==========
Loading model: C:\Users\orijp\OneDrive\Desktop\chatgpts\oobabooga_windows\oobabooga_windows\text-generation-webui\models\ggml-guanaco-13B.ggmlv3.q5_1.bin
[Threads: 12, BlasThreads: 12, SmartContext: False]

---
Identified as LLAMA model: (ver 5)
Attempting to Load...
---
System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
llama.cpp: loading model from C:\Users\orijp\OneDrive\Desktop\chatgpts\oobabooga_windows\oobabooga_windows\text-generation-webui\models\ggml-guanaco-13B.ggmlv3.q5_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB

Platform:0 Device:0  - NVIDIA CUDA with NVIDIA GeForce RTX 3050 Ti Laptop GPU
Platform:1 Device:0  - AMD Accelerated Parallel Processing with gfx90c
Platform:2 Device:0  - OpenCLOn12 with AMD Radeon(TM) Graphics
Platform:2 Device:1  - OpenCLOn12 with NVIDIA GeForce RTX 3050 Ti Laptop GPU
Platform:2 Device:2  - OpenCLOn12 with Microsoft Basic Render Driver

ggml_opencl: selecting platform: 'NVIDIA CUDA'
ggml_opencl: selecting device: 'NVIDIA GeForce RTX 3050 Ti Laptop GPU'
ggml_opencl: device FP16 support: false
CL FP16 temporarily disabled pending further optimization.
llama_model_load_internal: using OpenCL for GPU acceleration
llama_model_load_internal: mem required  = 9770.65 MB (+ 1608.00 MB per state)
llama_model_load_internal: offloading 7 layers to GPU
llama_model_load_internal: total VRAM used: 1588 MB
...................
llama_init_from_file: kv self size  = 1600.00 MB
Load Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold HTTP Server on port 5001
Please connect to custom endpoint at http://localhost:5001
127.0.0.1 - - [04/Jun/2023 23:42:46] "GET / HTTP/1.1" 302 -
Force redirect to streaming mode, as --stream is set.
127.0.0.1 - - [04/Jun/2023 23:42:46] "GET /?streaming=1 HTTP/1.1" 200 -
127.0.0.1 - - [04/Jun/2023 23:42:46] "GET /api/v1/model HTTP/1.1" 200 -
127.0.0.1 - - [04/Jun/2023 23:42:46] "GET /api/v1/info/version HTTP/1.1" 200 -
127.0.0.1 - - [04/Jun/2023 23:42:46] "GET /sw.js HTTP/1.1" 404 -
127.0.0.1 - - [04/Jun/2023 23:42:46] "GET /manifest.json HTTP/1.1" 404 -

Input: {"n": 1, "max_context_length": 2048, "max_length": 8, "rep_pen": 1.25, "temperature": 0.5, "top_p": 0.3, "top_k": 0.4, "top_a": 0.95, "typical": 1, "tfs": 1, "rep_pen_range": 1024, "rep_pen_slope": 0.7, "sampler_order": [0, 1, 2, 3, 4, 5, 6], "prompt": "[The following is a chat message log between you and an extremely intelligent and knowledgeable AI system named KoboldGPT. KoboldGPT is a state-of-the-art Artificial General Intelligence. You may ask any question, or request any task, and KoboldGPT will always be able to respond accurately and truthfully.]\n\nYou: What are german shepherds?\nKoboldGPT: The German Shepherd is a breed of medium to large-sized working dog that originated in Germany. In the English language, the breed's officially recognized name is German Shepherd Dog. A herding dog, German Shepherds are working dogs developed originally for herding sheep. Since that time, however, because of their strength, intelligence, trainability, and obedience, German Shepherds around the world are often the preferred breed for many types of work, including disability assistance, search-and-rescue, police and military roles and acting.\nYou: Which is heavier, a duck or a car?\nKoboldGPT: A car weighs around 1300 kilograms, while a duck only weighs about 2 kilograms. Therefore, a car is heavier than a duck.\nYou: What is the atomic number of Calcium?\nKoboldGPT: A calcium atom has 20 protons, 20 electrons and 20 neutrons. The atomic number or proton number of a chemical element is the number of protons found in the nucleus of every atom of that element. The atomic number uniquely identifies a chemical element. It is identical to the charge number of the nucleus. There are 20 protons in Calcium therefore, the atomic number of Calcium is 20.\nYou: What is 28 + 13?\nKoboldGPT: 28 + 13 is equal to 41.\nYou: Tell me a joke.\nKoboldGPT: Why did the chicken cross the road? To get to the other side!\nYou: How is an earthquake measured?\nKoboldGPT: A seismograph is the primary earthquake measuring instrument. The seismograph produces a digital graphic recording of the ground motion caused by the seismic waves. The digital recording is called seismogram. A network of worldwide seismographs detects and measures the strength and duration of the earthquake's waves. The magnitude of an earthquake and the intensity of shaking is usually reported on the Richter scale.\n\nKoboldGPT: Hello, I am KoboldGPT, your personal AI assistant. What would you like to know?\nYou: fr\nYou: ok\nYou: hi\nYou: crigne\nKoboldGPT:", "quiet": true, "stop_sequence": ["You:"]}
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 65491)
Traceback (most recent call last):
  File "socketserver.py", line 316, in _handle_request_noblock
  File "socketserver.py", line 347, in process_request
  File "socketserver.py", line 360, in finish_request
  File "koboldcpp.py", line 225, in __call__
  File "http\server.py", line 647, in __init__
  File "socketserver.py", line 747, in __init__
  File "http\server.py", line 427, in handle
  File "http\server.py", line 415, in handle_one_request
  File "koboldcpp.py", line 326, in do_POST
  File "koboldcpp.py", line 172, in generate
TypeError: int expected instead of float
----------------------------------------
yesbroc commented 1 year ago

how could i reinstall?

LostRuins commented 1 year ago

This bug will be fixed in the new version. For now, you can fix it by setting topK back to 0.

LostRuins commented 1 year ago

Hi, can you please try the latest version? This should be fixed now.

yesbroc commented 1 year ago

that fixed the issue, thanks