Prerequisites

Please answer the following questions for yourself before submitting an issue.

[X (1.33)] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[X] I carefully followed the README.md.
[X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[X] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I am expecting output from the webui beyond 100 seconds of generation.

Current Behavior

I am trying to generate a 2300+ prompt to test models and their functionality with this service in order to determine the best service for test generation webui running on my homelab. This error appears after submitting a prompt and waiting 100 seconds on the dot. I am expecting loading times to be after 100 seconds for the weight on this prompt. I am trying to make the webui from erroring out at 100 seconds. The terminal on the server's side is showing the prompt and response no problem. It is just the webui portion that is erroring out and not displaying the response. Contextsize: 4096. GGML superhot Pygmalion

Environment and Context

OS: Debian 11 CPU; Xeon E5-2670v3 (25C Virtualized) RAM: 80GB GPU: None

Physical (or virtual) hardware you are using, e.g. for Linux:

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 25 On-line CPU(s) list: 0-24 Thread(s) per core: 1 Core(s) per socket: 25 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz Stepping: 2 CPU MHz: 2299.998 BogoMIPS: 4599.99 Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 800 KiB L1i cache: 800 KiB L2 cache: 100 MiB L3 cache: 16 MiB NUMA node0 CPU(s): 0-24 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Mitigation; PTE Inversion; VMX flush not necess ary, SMT disabled Vulnerability Mds: Mitigation; Clear CPU buffers; SMT Host state u nknown Vulnerability Meltdown: Mitigation; PTI Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no mic rocode; SMT Host state unknown Vulnerability Retbleed: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled v ia prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and _user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS FW, STIBP disabled, RSB filling, PBRSB-eIBRS No t affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtr r pge mca cmov pat pse36 clflush mmx fxsr sse s se2 ss ht syscall nx pdpe1gb rdtscp lm constant _tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx1 6 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt t sc_deadline_timer aes xsave avx f16c rdrand hyp ervisor lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpr iority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat umip md_clear arch_capabilities

Operating System, e.g. for Linux:

Linux debian 5.10.0-23-amd64 #1 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux

Bowsers used to test error

Brave: 1.52.126 Chrome: 114.0.5735.199 Firefox: 113.0.1 BUT ERROR IS NOW: Error while submitting prompt: SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data

SDK version, e.g. for Linux:

Python 3.9.2
GNU Make 4.3
Built for x86_64-pc-linux-gnu
g++ (Debian 10.2.1-6) 10.2.1 20210110

Failure Information (for bugs)

Server: Generate: The response could not be sent, maybe connection was terminated? Client: Error while submitting prompt: SyntaxError: Unexpected token '<', "DOCTYPE"... is not a valid JSON

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

Build and deploy the service using Nginx and Cloudflare
Load pygmalion-13b-superhot-8k.ggmlv3.q4_K_M.bin
Generate a prompt over 2048 in length.

Failure Logs

username@debian:/media/username/Storage/koboldcpp-1.33$ python3 koboldcpp.py pygmalion-13b-superhot-8k.ggmlv3.q4_K_M.bin 6969 --contextsize 4096 Welcome to KoboldCpp - Version 1.33 Attempting to use OpenBLAS library for faster prompt ingestion. A compatible libopenblas will be required. Initializing dynamic library: koboldcpp_openblas.so

Loading model: /media/username/Storage/koboldcpp-1.33/pygmalion-13b-superhot-8k.ggmlv3.q4_K_M.bin [Threads: 11, BlasThreads: 11, SmartContext: False]

Identified as LLAMA model: (ver 5) Attempting to Load...

System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | llama.cpp: loading model from /media/username/Storage/koboldcpp-1.33/pygmalion-13b-superhot-8k.ggmlv3.q4_K_M.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 4096 llama_model_load_internal: n_embd = 5120 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 40 llama_model_load_internal: n_layer = 40 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 15 (mostly Q4_K - Medium) llama_model_load_internal: n_ff = 13824 llama_model_load_internal: model size = 13B llama_model_load_internal: ggml ctx size = 0.09 MB llama_model_load_internal: mem required = 10572.94 MB (+ 1608.00 MB per state) llama_new_context_with_model: kv self size = 3200.00 MB Load Model OK: True Embedded Kobold Lite loaded. Starting Kobold HTTP Server on port 6969 Please connect to custom endpoint at http://localhost:6969

LostRuins / koboldcpp

[theman23290] Error while submitting prompt: SyntaxError: Unexpected token '<', "DOCTYPE"... is not a valid JSON #285