LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.85k stars 342 forks source link

DRY and Nemo problem #1093

Closed Danik-droid closed 3 weeks ago

Danik-droid commented 3 weeks ago

In models based on the mistral nemo enabling 'DRY Repetition Penalty' causes about 20 seconds of additional initialization time each time, on Radeon 6900xt.

In the Rocm version using rocm there is no such problem.

With Dry on:

CtxLimit:1065/4096, Amt:57/400, Init:18.49s, Process:2.38s (2.4ms/T = 423.53T/s), Generate:1.59s (27.8ms/T = 35.94T/s), Total:3.97s (14.37T/s)

Dry off:

CtxLimit:1082/4096, Amt:74/400, Init:0.01s, Process:2.37s (2.4ms/T = 425.14T/s), Generate:2.07s (28.0ms/T = 35.77T/s), Total:4.44s (16.67T/s)

Additional Information: Windows 10 pro, 22H2 (latest AMD drivers), [koboldcpp-1.73.1 - nocuda]

Log Dry off: ****

`*** Welcome to KoboldCpp - Version 1.73.1 For command line arguments, please refer to --help


Auto Selected Vulkan Backend...

Trying to automatically determine GPU layers... Auto Recommended Layers: 43 Attempting to use Vulkan library for faster prompt ingestion. A compatible Vulkan will be required. Initializing dynamic library: koboldcpp_vulkan.dll

Namespace(benchmark=None, blasbatchsize=512, blasthreads=7, chatcompletionsadapter=None, config=None, contextsize=4096, debugmode=0, flashattention=False, forceversion=0, foreground=False, gpulayers=43, highpriority=False, hordeconfig=None, hordegenlen=0, hordekey='', hordemaxctx=0, hordemodelname='', hordeworkername='', host='', ignoremissing=False, launch=False, lora=None, mmproj=None, model='', model_param='Z:/LLM/models/TheDrummer/Rocinante-12B-v1.1-GGUF/Rocinante-12B-v1.1-Q6_K.gguf', multiuser=1, noavx2=False, noblas=False, nocertify=False, nommap=False, noshift=True, onready='', password=None, port=5001, port_param=5001, preloadstory=None, prompt='', promptlimit=100, quantkv=0, quiet=False, remotetunnel=False, ropeconfig=[0.0, 10000.0], sdclamped=0, sdconfig=None, sdlora='', sdloramult=1.0, sdmodel='', sdquant=False, sdthreads=7, sdvae='', sdvaeauto=False, skiplauncher=False, smartcontext=False, ssl=None, tensor_split=None, threads=7, unpack='', useclblast=None, usecublas=None, usemlock=False, usevulkan=[0], whispermodel='')

Loading model: Z:\LLM\models\TheDrummer\Rocinante-12B-v1.1-GGUF\Rocinante-12B-v1.1-Q6_K.gguf

The reported GGUF Arch is: llama Arch Category: 0


Identified as GGUF model: (ver 6) Attempting to Load...

Using automatic RoPE scaling for GGUF. If the model has custom RoPE settings, they'll be used directly instead! It means that the RoPE values written above will be replaced by the RoPE values indicated after loading. System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | llama_model_loader: loaded meta data with 31 key-value pairs and 363 tensors from Z:\LLM\models\TheDrummer\Rocinante-12B-v1.1-GGUF\Rocinante-12B-v1.1-Q6_K.gguf (version GGUF V3 (latest)) llm_load_vocab: special tokens cache size = 1000 llm_load_vocab: token to piece cache size = 0.8498 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 131072 llm_load_print_meta: n_merges = 269443 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 1024000 llm_load_print_meta: n_embd = 5120 llm_load_print_meta: n_layer = 40 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 1000000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 1024000 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 13B llm_load_print_meta: model ftype = unknown, may not work llm_load_print_meta: model params = 12.25 B llm_load_print_meta: model size = 9.36 GiB (6.56 BPW) llm_load_print_meta: general.name = E:\mergekit\mistralaiMistral Nemo Base 2407 llm_load_print_meta: BOS token = 1 '' llm_load_print_meta: EOS token = 2 '' llm_load_print_meta: UNK token = 0 '' llm_load_print_meta: LF token = 1196 'Ä' llm_load_print_meta: max token length = 150 ggml_vulkan: Found 1 Vulkan devices: Vulkan0: AMD Radeon RX 6900 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 llm_load_tensors: ggml ctx size = 0.40 MiB llm_load_tensors: offloading 40 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 41/41 layers to GPU llm_load_tensors: AMD Radeon RX 6900 XT buffer size = 9057.83 MiB llm_load_tensors: CPU buffer size = 525.00 MiB ........................................................................................... Automatic RoPE Scaling: Using model internal value. llama_new_context_with_model: n_ctx = 4096 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: AMD Radeon RX 6900 XT KV buffer size = 640.00 MiB llama_new_context_with_model: KV self size = 640.00 MiB, K (f16): 320.00 MiB, V (f16): 320.00 MiB llama_new_context_with_model: Vulkan_Host output buffer size = 0.50 MiB llama_new_context_with_model: AMD Radeon RX 6900 XT compute buffer size = 300.00 MiB llama_new_context_with_model: Vulkan_Host compute buffer size = 18.01 MiB llama_new_context_with_model: graph nodes = 1286 llama_new_context_with_model: graph splits = 2 Load Text Model OK: True Embedded KoboldAI Lite loaded. Embedded API docs loaded. Starting Kobold API on port 5001 at http://localhost:5001/api/ Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/

Please connect to custom endpoint at http://localhost:5001

Input: {"prompt": "<|im_start|>system\nYou are an intelligent, skilled, versatile writer.\nYour task is to write a story based on the information below.\n## Overall plot description:\nSister Seraphina is a 21 year old nun. She devoted her entire childhood to her worship of God and, to make her parents proud, she became a nun. Sister Seraphina, a devout and gentle soul at the age of 21, stands as a luminous example of unwavering faith and dedication. A nun whose life is a testament to her unyielding commitment to her spiritual path, Sister Seraphina's presence radiates an aura of purity and serenity.\nWith her heart and mind devoted entirely to her faith, Sister Seraphina has embraced a life of selflessness, putting her devotion to God above all else. Her every action is a reflection of her profound reverence for the divine, guiding her interactions with compassion, humility, and grace. \nHaving taken a solemn vow of Celibacy, Sister Seraphina views her commitment as a sacred and powerful choice that allows her to cultivate a deeper connection with her spiritual calling. This decision has enabled her to channel her energies into serving others, as well as to offer a source of solace and inspiration to those who seek guidance.\nSister Seraphina's youthful age belies the wisdom and maturity she possesses. Her unwavering faith is a guiding light for her community, and her unblemished dedication to her beliefs makes her a source of inspiration for both the young and the old. Her calm demeanor and serene smile bring comfort to those around her, as they sense her connection to something greater than the material world.\nIn a world often filled with distractions, Sister Seraphina's steadfast commitment to purity, devotion to God, and unwavering vow of Celibacy serve as a poignant reminder of the power of spiritual dedication. Her story is one of profound faith, sacrifice, and a life guided by a higher purpose.\n## Characters:\n### Sister Seraphina\nGentle, Faith, Prude, Pure, Puritan, Devout, a very high sense of justice, morality and ethics.\n[Dani is a guy. Handsome, a bit athletic. Often horny, with a big cock and big balls. He cum a lot, and his cum taste very good.]<|im_end|><|im_start|>text names= Dani\n\"Hello there! The garden looks absolutely stunning. Do you ever find spiritual inspiration in nature's beauty?\" you said.<|im_end|>\n<|im_start|>text names= Sister Seraphina\n\"Oh, greetings to you! Absolutely, my friend. Nature's wonders often remind me of the divine's intricate handiwork. It's as if every petal and every leaf tells a story of creation and grace.\" she said, admiring the beauty of the garden..<|im_end|>\n<|im_start|>text names= Dani\n\"Hey! Looking up at the stars, I often wonder about our place in the universe. Do you ever contemplate how our lives fit into the grand scheme of things?\" you asked her.<|im_end|>\n<|im_start|>text names= Sister Seraphina\n\"Ah, hello! I share your sentiments. Beneath these stars, it's easy to feel both small and connected. Each one seems to carry a whisper of destiny, reminding us of the cosmic tapestry we're all a part of.\" she said thoughtfully.<|im_end|>\n<|im_start|>text names= Dani\n\"Hi! There's something so comforting about a cup of tea, isn't there? It's like a hug in a mug. What's your favorite way to find comfort?\" you asked.<|im_end|>\n<|im_start|>text names= Sister Seraphina\n\"Greetings! You've captured it perfectly. Tea does hold a special place in my heart. But beyond that, the warmth of genuine connections brings me the most comfort. Just like this chat, where our thoughts intertwine like the steam rising from a cup.\" she said with a smile.<|im_end|>\n<|im_end|>\n<|im_start|>text names= Sister Seraphina\n\"Greetings, dear friend! As I stroll through this tranquil garden, I find myself drawn to the beauty of nature's creations. The delicate petals and soothing breeze remind me of the divine's masterful touch. Share with me your thoughts amidst this oasis of serenity.\" she said with joy.<|im_end|>\n<|im_start|>text names= Dani\n\"Hi How are you?\" you asked.<|im_end|>\n<|im_start|>text names= Sister Seraphina\n", "max_new_tokens": 400, "max_tokens": 400, "temperature": 0.7, "top_p": 1, "typical_p": 1, "typical": 1, "sampler_seed": -1, "min_p": 0.1, "repetition_penalty": 1, "frequency_penalty": 0, "presence_penalty": 0, "top_k": 0, "skew": 0, "min_tokens": 0, "length_penalty": 1, "early_stopping": false, "add_bos_token": true, "smoothing_factor": 0, "smoothing_curve": 1, "dry_allowed_length": 2, "dry_multiplier": 0, "dry_base": 1.75, "dry_sequence_breakers": "[\"\n\",\":\",\"\\"\",\"'\",\"*\",\"USER:\",\"ASSISTANT:\",\"2137:\",\"Dottore:\",\"Narrator:\",\"Zhur:\",\"Zhur\",\"<|im_start|>\",\"<|imend|>\",\"<\",\"|\",\">\",\"im\",\"end\",\"\",\"start\",\"system\",\"USER\",\"ASSISTANT\",\"2137\",\"Dottore\",\"Narrator\",\"im_end\",\"im_start\",\"user\",\"assistant\",\"im_sep\",\"sep\",\"<|im_sep|>\",\"<|im_start|>user\",\"<|imstart|>assistant\",\"<|end|>\",\"\",\"[INST]\",\"[/INST]\",\"[\",\"]\",\"INST\"]", "dry_penalty_last_n": 4096, "max_tokens_second": 0, "stopping_strings": ["\nDani:", "<|im_start|>text names= Dani", "<|im_end|>", "<|im_start|>text names= Sister Seraphina"], "stop": ["\nDani:", "<|im_start|>text names= Dani", "<|im_end|>", "<|im_start|>text names= Sister Seraphina"], "truncation_length": 16384, "ban_eos_token": false, "skip_special_tokens": false, "top_a": 0, "tfs": 1, "mirostat_mode": 0, "mirostat_tau": 5, "mirostat_eta": 0.1, "custom_token_bans": "", "banned_strings": [], "api_type": "koboldcpp", "api_server": "http://127.0.0.1:5001", "legacy_api": false, "sampler_order": [6, 0, 1, 3, 4, 2, 5], "grammar": "", "rep_pen": 1, "rep_pen_range": 0, "repetition_penalty_range": 0, "seed": -1, "guidance_scale": 1, "negative_prompt": "", "grammar_string": "", "repeat_penalty": 1, "tfs_z": 1, "repeat_last_n": 0, "n_predict": 400, "num_predict": 400, "num_ctx": 16384, "mirostat": 0, "ignore_eos": false, "rep_pen_slope": 0, "stream": true}

Processing Prompt [BLAS] (1008 / 1008 tokens) Generating (74 / 400 tokens) (Stop sequence triggered: <|im_end|>) CtxLimit:1082/4096, Amt:74/400, Init:0.01s, Process:2.37s (2.4ms/T = 425.14T/s), Generate:2.07s (28.0ms/T = 35.77T/s), Total:4.44s (16.67T/s) Output: "Greetings! I am well, thank you for your kind inquiry. The beauty of this garden fills my heart with joy and gratitude. It's a reminder of the divine's boundless creativity. How about yourself, dear friend? I hope you too are finding peace and inspiration in this tranquil space." she replied with a warm smile.<|im_end|>`

Log Dry on: **

`*** Welcome to KoboldCpp - Version 1.73.1 For command line arguments, please refer to --help


Auto Selected Vulkan Backend...

Trying to automatically determine GPU layers... Auto Recommended Layers: 43 Attempting to use Vulkan library for faster prompt ingestion. A compatible Vulkan will be required. Initializing dynamic library: koboldcpp_vulkan.dll

Namespace(benchmark=None, blasbatchsize=512, blasthreads=7, chatcompletionsadapter=None, config=None, contextsize=4096, debugmode=0, flashattention=False, forceversion=0, foreground=False, gpulayers=43, highpriority=False, hordeconfig=None, hordegenlen=0, hordekey='', hordemaxctx=0, hordemodelname='', hordeworkername='', host='', ignoremissing=False, launch=False, lora=None, mmproj=None, model='', model_param='Z:/LLM/models/TheDrummer/Rocinante-12B-v1.1-GGUF/Rocinante-12B-v1.1-Q6_K.gguf', multiuser=1, noavx2=False, noblas=False, nocertify=False, nommap=False, noshift=True, onready='', password=None, port=5001, port_param=5001, preloadstory=None, prompt='', promptlimit=100, quantkv=0, quiet=False, remotetunnel=False, ropeconfig=[0.0, 10000.0], sdclamped=0, sdconfig=None, sdlora='', sdloramult=1.0, sdmodel='', sdquant=False, sdthreads=7, sdvae='', sdvaeauto=False, skiplauncher=False, smartcontext=False, ssl=None, tensor_split=None, threads=7, unpack='', useclblast=None, usecublas=None, usemlock=False, usevulkan=[0], whispermodel='')

Loading model: Z:\LLM\models\TheDrummer\Rocinante-12B-v1.1-GGUF\Rocinante-12B-v1.1-Q6_K.gguf

The reported GGUF Arch is: llama Arch Category: 0


Identified as GGUF model: (ver 6) Attempting to Load...

Using automatic RoPE scaling for GGUF. If the model has custom RoPE settings, they'll be used directly instead! It means that the RoPE values written above will be replaced by the RoPE values indicated after loading. System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | llama_model_loader: loaded meta data with 31 key-value pairs and 363 tensors from Z:\LLM\models\TheDrummer\Rocinante-12B-v1.1-GGUF\Rocinante-12B-v1.1-Q6_K.gguf (version GGUF V3 (latest)) llm_load_vocab: special tokens cache size = 1000 llm_load_vocab: token to piece cache size = 0.8498 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 131072 llm_load_print_meta: n_merges = 269443 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 1024000 llm_load_print_meta: n_embd = 5120 llm_load_print_meta: n_layer = 40 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 1000000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 1024000 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 13B llm_load_print_meta: model ftype = unknown, may not work llm_load_print_meta: model params = 12.25 B llm_load_print_meta: model size = 9.36 GiB (6.56 BPW) llm_load_print_meta: general.name = E:\mergekit\mistralaiMistral Nemo Base 2407 llm_load_print_meta: BOS token = 1 '' llm_load_print_meta: EOS token = 2 '' llm_load_print_meta: UNK token = 0 '' llm_load_print_meta: LF token = 1196 'Ä' llm_load_print_meta: max token length = 150 ggml_vulkan: Found 1 Vulkan devices: Vulkan0: AMD Radeon RX 6900 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 llm_load_tensors: ggml ctx size = 0.40 MiB llm_load_tensors: offloading 40 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 41/41 layers to GPU llm_load_tensors: AMD Radeon RX 6900 XT buffer size = 9057.83 MiB llm_load_tensors: CPU buffer size = 525.00 MiB ........................................................................................... Automatic RoPE Scaling: Using model internal value. llama_new_context_with_model: n_ctx = 4096 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: AMD Radeon RX 6900 XT KV buffer size = 640.00 MiB llama_new_context_with_model: KV self size = 640.00 MiB, K (f16): 320.00 MiB, V (f16): 320.00 MiB llama_new_context_with_model: Vulkan_Host output buffer size = 0.50 MiB llama_new_context_with_model: AMD Radeon RX 6900 XT compute buffer size = 300.00 MiB llama_new_context_with_model: Vulkan_Host compute buffer size = 18.01 MiB llama_new_context_with_model: graph nodes = 1286 llama_new_context_with_model: graph splits = 2 Load Text Model OK: True Embedded KoboldAI Lite loaded. Embedded API docs loaded. Starting Kobold API on port 5001 at http://localhost:5001/api/ Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/

Please connect to custom endpoint at http://localhost:5001

Input: {"prompt": "<|im_start|>system\nYou are an intelligent, skilled, versatile writer.\nYour task is to write a story based on the information below.\n## Overall plot description:\nSister Seraphina is a 21 year old nun. She devoted her entire childhood to her worship of God and, to make her parents proud, she became a nun. Sister Seraphina, a devout and gentle soul at the age of 21, stands as a luminous example of unwavering faith and dedication. A nun whose life is a testament to her unyielding commitment to her spiritual path, Sister Seraphina's presence radiates an aura of purity and serenity.\nWith her heart and mind devoted entirely to her faith, Sister Seraphina has embraced a life of selflessness, putting her devotion to God above all else. Her every action is a reflection of her profound reverence for the divine, guiding her interactions with compassion, humility, and grace. \nHaving taken a solemn vow of Celibacy, Sister Seraphina views her commitment as a sacred and powerful choice that allows her to cultivate a deeper connection with her spiritual calling. This decision has enabled her to channel her energies into serving others, as well as to offer a source of solace and inspiration to those who seek guidance.\nSister Seraphina's youthful age belies the wisdom and maturity she possesses. Her unwavering faith is a guiding light for her community, and her unblemished dedication to her beliefs makes her a source of inspiration for both the young and the old. Her calm demeanor and serene smile bring comfort to those around her, as they sense her connection to something greater than the material world.\nIn a world often filled with distractions, Sister Seraphina's steadfast commitment to purity, devotion to God, and unwavering vow of Celibacy serve as a poignant reminder of the power of spiritual dedication. Her story is one of profound faith, sacrifice, and a life guided by a higher purpose.\n## Characters:\n### Sister Seraphina\nGentle, Faith, Prude, Pure, Puritan, Devout, a very high sense of justice, morality and ethics.\n[Dani is a guy. Handsome, a bit athletic. Often horny, with a big cock and big balls. He cum a lot, and his cum taste very good.]<|im_end|><|im_start|>text names= Dani\n\"Hello there! The garden looks absolutely stunning. Do you ever find spiritual inspiration in nature's beauty?\" you said.<|im_end|>\n<|im_start|>text names= Sister Seraphina\n\"Oh, greetings to you! Absolutely, my friend. Nature's wonders often remind me of the divine's intricate handiwork. It's as if every petal and every leaf tells a story of creation and grace.\" she said, admiring the beauty of the garden..<|im_end|>\n<|im_start|>text names= Dani\n\"Hey! Looking up at the stars, I often wonder about our place in the universe. Do you ever contemplate how our lives fit into the grand scheme of things?\" you asked her.<|im_end|>\n<|im_start|>text names= Sister Seraphina\n\"Ah, hello! I share your sentiments. Beneath these stars, it's easy to feel both small and connected. Each one seems to carry a whisper of destiny, reminding us of the cosmic tapestry we're all a part of.\" she said thoughtfully.<|im_end|>\n<|im_start|>text names= Dani\n\"Hi! There's something so comforting about a cup of tea, isn't there? It's like a hug in a mug. What's your favorite way to find comfort?\" you asked.<|im_end|>\n<|im_start|>text names= Sister Seraphina\n\"Greetings! You've captured it perfectly. Tea does hold a special place in my heart. But beyond that, the warmth of genuine connections brings me the most comfort. Just like this chat, where our thoughts intertwine like the steam rising from a cup.\" she said with a smile.<|im_end|>\n<|im_end|>\n<|im_start|>text names= Sister Seraphina\n\"Greetings, dear friend! As I stroll through this tranquil garden, I find myself drawn to the beauty of nature's creations. The delicate petals and soothing breeze remind me of the divine's masterful touch. Share with me your thoughts amidst this oasis of serenity.\" she said with joy.<|im_end|>\n<|im_start|>text names= Dani\n\"Hi How are you?\" you asked.<|im_end|>\n<|im_start|>text names= Sister Seraphina\n", "max_new_tokens": 400, "max_tokens": 400, "temperature": 0.7, "top_p": 1, "typical_p": 1, "typical": 1, "sampler_seed": -1, "min_p": 0.1, "repetition_penalty": 1, "frequency_penalty": 0, "presence_penalty": 0, "top_k": 0, "skew": 0, "min_tokens": 0, "length_penalty": 1, "early_stopping": false, "add_bos_token": true, "smoothing_factor": 0, "smoothing_curve": 1, "dry_allowed_length": 2, "dry_multiplier": 0.8, "dry_base": 1.75, "dry_sequence_breakers": "[\"\n\",\":\",\"\\"\",\"'\",\"*\",\"USER:\",\"ASSISTANT:\",\"2137:\",\"Dottore:\",\"Narrator:\",\"Zhur:\",\"Zhur\",\"<|im_start|>\",\"<|imend|>\",\"<\",\"|\",\">\",\"im\",\"end\",\"\",\"start\",\"system\",\"USER\",\"ASSISTANT\",\"2137\",\"Dottore\",\"Narrator\",\"im_end\",\"im_start\",\"user\",\"assistant\",\"im_sep\",\"sep\",\"<|im_sep|>\",\"<|im_start|>user\",\"<|imstart|>assistant\",\"<|end|>\",\"\",\"[INST]\",\"[/INST]\",\"[\",\"]\",\"INST\"]", "dry_penalty_last_n": 4096, "max_tokens_second": 0, "stopping_strings": ["\nDani:", "<|im_start|>text names= Dani", "<|im_end|>", "<|im_start|>text names= Sister Seraphina"], "stop": ["\nDani:", "<|im_start|>text names= Dani", "<|im_end|>", "<|im_start|>text names= Sister Seraphina"], "truncation_length": 16384, "ban_eos_token": false, "skip_special_tokens": false, "top_a": 0, "tfs": 1, "mirostat_mode": 0, "mirostat_tau": 5, "mirostat_eta": 0.1, "custom_token_bans": "", "banned_strings": [], "api_type": "koboldcpp", "api_server": "http://127.0.0.1:5001", "legacy_api": false, "sampler_order": [6, 0, 1, 3, 4, 2, 5], "grammar": "", "rep_pen": 1, "rep_pen_range": 0, "repetition_penalty_range": 0, "seed": -1, "guidance_scale": 1, "negative_prompt": "", "grammar_string": "", "repeat_penalty": 1, "tfs_z": 1, "repeat_last_n": 0, "n_predict": 400, "num_predict": 400, "num_ctx": 16384, "mirostat": 0, "ignore_eos": false, "rep_pen_slope": 0, "stream": true}

Processing Prompt [BLAS] (1008 / 1008 tokens) Generating (57 / 400 tokens) (Stop sequence triggered: <|im_end|>) CtxLimit:1065/4096, Amt:57/400, Init:18.49s, Process:2.38s (2.4ms/T = 423.53T/s), Generate:1.59s (27.8ms/T = 35.94T/s), Total:3.97s (14.37T/s) Output: "Hello! I am doing well, thank you for asking. This garden is truly a sanctuary of peace and reflection. The gentle rustling of leaves and the sweet scent of blooming flowers bring such joy to my heart." she said with a warm smile.<|im_end|>`

LostRuins commented 3 weeks ago

Yes, this is not a bug. DRY requires a pretty heavy amount of pre-processing, especially with long/multiple dry_sequence_breakers, which you can try reducing. Where did you get this preset? You can try to use the sequence breakers that @pi6am provided instead, which is "\n", ":", "\"", "*", dry_penalty_last_n is also pretty high at 4096.

Danik-droid commented 3 weeks ago

Yes, this is not a bug. DRY requires a pretty heavy amount of pre-processing, especially with long/multiple dry_sequence_breakers, which you can try reducing. Where did you get this preset? You can try to use the sequence breakers that @pi6am provided instead, which is "\n", ":", "\"", "*", dry_penalty_last_n is also pretty high at 4096.

I don't remember, I modified something I found on the Internet. I have noticed that these Dry 4096 settings help eliminate virtually all repetitions, but it does slow down the generation a bit. This Dry preset seems to be the default (or one of the possible defaults) in silly taverns (except maybe dry_penalty_last_n 2048 or 1024).

https://github.com/LostRuins/koboldcpp/releases/tag/v1.73.1 in this version these settings work correctly, even if I set much more (like 8k). Rocm somehow does it smarter, or LostRuins modified something (although in his version if you use something other than rocm the initiation is very long too).

Thanks for the suggestions, I'll play around with these settings.

Danik-droid commented 3 weeks ago

BTW. This long initialization time is only in mistral Nemo (llama, gemma yi etc. work normally).

"\n", ":", "\"", "*", actually solves the problem and the initialization time returns to normal.

Danik-droid commented 3 weeks ago

So the topic can be closed, I had some overcomplicated settings.