ggerganov / llama.cpp

LLM inference in C/C++
MIT License
65.23k stars 9.35k forks source link

server and llama3: escaping escaped characters which leads to endless backslahes, newlines etc. in the textarea #7131

Closed bones0 closed 2 months ago

bones0 commented 4 months ago

I discussed matters with llama-3-70B-GGUF directly in the server GUI. I am not sure how I could reproduce that in the server test scenario framework so you may forgive me if I send you the output of the conversation. It is, after all, not a uncommon problem. Multiple escaping happens quite often. In the conversation you can find my theory of what's going on and also the point when the output gets garbled since llama-3 has a tendency to forget that it promised to not use escaping a few lines up:

Llama: I see what you're getting at!

I'm a large language model, I don't have direct access to the source code of llama.cpp. However, it's possible that there might be some escaping mechanisms already in place within the code.

If that's the case, then yes, my additional escapes could potentially lead to re-escaping issues. It would be best to consult with the developers or maintainers of llama.cpp to determine how they handle string encoding and escaping internally.

Perhaps we can work together to identify any potential re-escaping problems and find a solution that takes into account the existing escape mechanisms within llama.cpp?<|eot_id|>

User: I see \\\\ which would make sense if you would escape the \ with a \ in order to print one \ and lama.cpp thinks you wanted to write \\ which it escapes with another two \ and sends it to the HTML <TEXTAREA></TEXTAREA> which does not need escaping at all and prints all four \

Llama: You are absolutely correct!
I was escaping the backslashes with another backslash, resulting in \\\\, which would then be interpreted by llama.cpp as an attempt to print two backslashes (\\). The additional escape mechanism within llama.cpp would then convert those into four backslashes (\\\\).
And finally, when rendered in the HTML <TEXTAREA>, all four backslashes are displayed, since they don't require any escaping.
I apologize for the unnecessary extra escapes. Thank you for pointing this out and helping me understand what's going on behind the scenes!<|eot_id|>

User: I should probably inform the lama.cpp-developers. I think it is a bug that lama.cpp is escaping anything in the server-program. It can send the text without changing it. In your case I have to create a pre-prompt which informs you to not escape anything.

Llama: I completely agree with you!
It's indeed a bug in llama.cpp if it is-`\\ was.
and., in\\\\:{., the for ,, by\\\\\\. and.,.:
[…]
to on, `implify.:

User: Please don't escape anything

Llama: Understood!
I'll make sure to send my responses without any escaping from now on. Please go ahead and inform the llama.cpp developers about this issue, so they can fix it in future updates.
Let's continue our conversation normally from here on out!<|eot_id|>

"Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug."

(llava) """@""""":~/repos/llama.cpp/build/bin$ git log
commit 928e0b7013c862cf10701957b3d654aa70f11bd8 (HEAD -> master, tag: b2749, origin/master, origin/HEAD)
Author: agray3 <agray3@users.noreply.github.com>
Date:   Fri Apr 26 19:08:30 2024 +0100

    Reset schedule earlier to allow overlap with ggml graph computation on device (#6933)

    * Reset schedule earlier to allow overlap with graph computation on device

commit 0c4d489e29e53589bf13a801fe7c94b7b546d8f6 (tag: b2748)
Author: Pierrick Hymbert <pierrick.hymbert@gmail.com>
Date:   Fri Apr 26 20:06:33 2024 +0200

I got quite some truckload of information about my system here:

(llava) xxx@xxxxxx:~/repos/llama.cpp/build/bin$ ./server -ngl 15000 -n 512 -m                                                                                                                                    /home/xxxxxxx/repos/Meta-Llama-3-70B-Instruct-GGUF/Meta-Llama-3-70B-Instruct-Q4_K_                                                                                                                                   M.gguf -t 10
{"tid":"139737450270720","timestamp":1714497107,"level":"INFO","function":"main"                                                                                                                                   ,"line":2909,"msg":"build info","build":2749,"commit":"928e0b70"}
{"tid":"139737450270720","timestamp":1714497107,"level":"INFO","function":"main"                                                                                                                                   ,"line":2914,"msg":"system info","n_threads":10,"n_threads_batch":-1,"total_thre                                                                                                                                   ads":32,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_V                                                                                                                                   BMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_V                                                                                                                                   A = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8                                                                                                                                    = 0 | LAMMAFILE = 1 | "}
llama_model_loader: loaded meta data with 21 key-value pairs and 723 tensors fro                                                                                                                                   m /home/peter/repos/Meta-Llama-3-70B-Instruct-GGUF/Meta-Llama-3-70B-Instruct-Q4_                                                                                                                                   K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not appl                                                                                                                                   y in this output.
llama_model_loader: - kv   0:                       general.architecture str                                                                                                                                                 = llama
llama_model_loader: - kv   1:                               general.name str                                                                                                                                                 = Meta-Llama-3-70B-Instruct
llama_model_loader: - kv   2:                          llama.block_count u32                                                                                                                                                 = 80
llama_model_loader: - kv   3:                       llama.context_length u32                                                                                                                                                 = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32                                                                                                                                                 = 8192
llama_model_loader: - kv   5:                  llama.feed_forward_length u32                                                                                                                                                 = 28672
llama_model_loader: - kv   6:                 llama.attention.head_count u32                                                                                                                                                 = 64
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32                                                                                                                                                 = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32                                                                                                                                                 = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32                                                                                                                                                 = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32                                                                                                                                                 = 15
llama_model_loader: - kv  11:                           llama.vocab_size u32                                                                                                                                                 = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32                                                                                                                                                 = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str                                                                                                                                                 = gpt2
llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str                                                                                                                                   ,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32                                                                                                                                   ,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str                                                                                                                                   ,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32                                                                                                                                                 = 128000
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32                                                                                                                                                 = 128001
llama_model_loader: - kv  19:                    tokenizer.chat_template str                                                                                                                                                 = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  20:               general.quantization_version u32                                                                                                                                                 = 2
llama_model_loader: - type  f32:  161 tensors
llama_model_loader: - type q4_K:  441 tensors
llama_model_loader: - type q5_K:   40 tensors
llama_model_loader: - type q6_K:   81 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 8192
llm_load_print_meta: n_head           = 64
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 80
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 28672
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 70B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 70.55 B
llm_load_print_meta: model size       = 39.59 GiB (4.82 BPW)
llm_load_print_meta: general.name     = Meta-Llama-3-70B-Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 2 CUDA devices:
  Device 0: Tesla M40 24GB, compute capability 5.2, VMM: yes
  Device 1: Tesla M40 24GB, compute capability 5.2, VMM: yes
llm_load_tensors: ggml ctx size =    1.10 MiB
llm_load_tensors: offloading 80 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 81/81 layers to GPU
llm_load_tensors:        CPU buffer size =   563.62 MiB
llm_load_tensors:      CUDA0 buffer size = 20038.81 MiB
llm_load_tensors:      CUDA1 buffer size = 19940.67 MiB
................................................................................                                                                                                                                   ...................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =    82.00 MiB
llama_kv_cache_init:      CUDA1 KV buffer size =    78.00 MiB
llama_new_context_with_model: KV self size  =  160.00 MiB, K (f16):   80.00 MiB,                                                                                                                                    V (f16):   80.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.98 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=4)
llama_new_context_with_model:      CUDA0 compute buffer size =   212.01 MiB
llama_new_context_with_model:      CUDA1 compute buffer size =   334.52 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    20.02 MiB
llama_new_context_with_model: graph nodes  = 2566
llama_new_context_with_model: graph splits = 3
{"tid":"139737450270720","timestamp":1714497121,"level":"INFO","function":"init"                                                                                                                                   ,"line":708,"msg":"initializing slots","n_slots":1}
{"tid":"139737450270720","timestamp":1714497121,"level":"INFO","function":"init"                                                                                                                                   ,"line":717,"msg":"new slot","id_slot":0,"n_ctx_slot":512}
{"tid":"139737450270720","timestamp":1714497121,"level":"INFO","function":"main"                                                                                                                                   ,"line":3009,"msg":"model loaded"}
{"tid":"139737450270720","timestamp":1714497121,"level":"INFO","function":"main"                                                                                                                                   ,"line":3031,"msg":"chat template","chat_example":"<|start_header_id|>system<|en                                                                                                                                   d_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|e                                                                                                                                   nd_header_id|>\n\nHello<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n                                                                                                                                   \nHi there<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow are you?<|eo                                                                                                                                   t_id|><|start_header_id|>assistant<|end_header_id|>\n\n","built_in":true}
{"tid":"139737450270720","timestamp":1714497121,"level":"INFO","function":"main"                                                                                                                                   ,"line":3762,"msg":"HTTP server listening","n_threads_http":"31","port":"8080","                                                                                                                                   hostname":"127.0.0.1"}
{"tid":"139737450270720","timestamp":1714497121,"level":"INFO","function":"updat                                                                                                                                   e_slots","line":1806,"msg":"all slots are idle"}
{"tid":"139737450270720","timestamp":1714497327,"level":"INFO","function":"launch_slot_with_task","line":1036,"msg":"slot is                                                                                       processing task","id_slot":0,"id_task":0}
{"tid":"139737450270720","timestamp":1714497327,"level":"INFO","function":"update_slots","line":2086,"msg":"kv cache rm [p0,                                                                                       end)","id_slot":0,"id_task":0,"p0":0}
{"tid":"139737450270720","timestamp":1714497376,"level":"INFO","function":"update_slots","line":1835,"msg":"slot context shif                                                                                      t","id_slot":0,"id_task":0,"n_keep":0,"n_left":511,"n_discard":255,"n_ctx":512,"n_past":511,"n_system_tokens":0,"n_cache_toke                                                                                      ns":511}
{"tid":"139736559644672","timestamp":1714497440,"level":"INFO","function":"log_server_request","line":2856,"msg":"request","r                                                                                      emote_addr":"127.0.0.1","remote_port":39558,"status":200,"method":"POST","path":"/completion","params":{}}
{"tid":"139737450270720","timestamp":1714497441,"level":"INFO","function":"update_slots","line":1780,"msg":"slot released","i                                                                                      d_slot":0,"id_task":0,"n_ctx":512,"n_past":373,"n_system_tokens":0,"n_cache_tokens":373,"truncated":true}
{"tid":"139737450270720","timestamp":1714497441,"level":"INFO","function":"update_slots","line":1806,"msg":"all slots are idl                                                                                      e"}
{"tid":"139737450270720","timestamp":1714497441,"level":"INFO","function":"launch_slot_with_task","line":1036,"msg":"slot is                                                                                       processing task","id_slot":0,"id_task":176}
{"tid":"139737450270720","timestamp":1714497441,"level":"INFO","function":"update_slots","line":2086,"msg":"kv cache rm [p0,                                                                                       end)","id_slot":0,"id_task":176,"p0":0}
{"tid":"139737450270720","timestamp":1714497523,"level":"INFO","function":"update_slots","line":1835,"msg":"slot context shif                                                                                      t","id_slot":0,"id_task":176,"n_keep":0,"n_left":511,"n_discard":255,"n_ctx":512,"n_past":511,"n_system_tokens":0,"n_cache_to                                                                                      kens":511}
{"tid":"139736551251968","timestamp":1714497612,"level":"INFO","function":"log_server_request","line":2856,"msg":"request","r                                                                                      emote_addr":"127.0.0.1","remote_port":44546,"status":200,"method":"POST","path":"/completion","params":{}}
{"tid":"139737450270720","timestamp":1714497612,"level":"INFO","function":"update_slots","line":1780,"msg":"slot released","i                                                                                      d_slot":0,"id_task":176,"n_ctx":512,"n_past":419,"n_system_tokens":0,"n_cache_tokens":419,"truncated":true}
{"tid":"139737450270720","timestamp":1714497612,"level":"INFO","function":"update_slots","line":1806,"msg":"all slots are idl                                                                                      e"}
{"tid":"139737450270720","timestamp":1714497612,"level":"INFO","function":"launch_slot_with_task","line":1036,"msg":"slot is                                                                                       processing task","id_slot":0,"id_task":470}
{"tid":"139737450270720","timestamp":1714497612,"level":"INFO","function":"update_slots","line":2086,"msg":"kv cache rm [p0,                                                                                       end)","id_slot":0,"id_task":470,"p0":0}
{"tid":"139737450270720","timestamp":1714497664,"level":"INFO","function":"update_slots","line":1835,"msg":"slot context shif                                                                                      t","id_slot":0,"id_task":470,"n_keep":0,"n_left":511,"n_discard":255,"n_ctx":512,"n_past":511,"n_system_tokens":0,"n_cache_to                                                                                      kens":511}
{"tid":"139736542859264","timestamp":1714497721,"level":"INFO","function":"log_server_request","line":2856,"msg":"request","r                                                                                      e...

"If the bug concerns the server, please try to reproduce it first using the server test scenario framework."

Pretty much failed to find out how I should reproduce this...

/llama.cpp/examples/server/tests$ ./tests.sh @llama.cpp @embeddings Feature: llama.cpp server # features/embeddings.feature:3 Starting new scenario: Embedding! server is listening on localhost:8080... HOOK-ERROR in before_scenario: AssertionError: Server already started

Background: Server startup # features/embeddings.feature:5

Scenario: Embedding # features/embeddings.feature:19 Given a server listening on localhost:8080 # None And a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf # None And a model file ggml-model-f16.gguf # None And a model alias bert-bge-small # None And 42 as server seed # None And 2 slots # None And 1024 as batch size # None And 1024 as ubatch size # None And 2048 KV cache size # None And embeddings extraction # None Then the server is starting # None Then the server is healthy # None When embeddings are computed for # None """ What is the capital of Bulgaria ? """ Then embeddings are generated # None

Failing scenarios: features/embeddings.feature:19 Embedding

2 features passed, 2 failed, 0 skipped, 2 untested 0 scenarios passed, 1 failed, 0 skipped, 42 untested 0 steps passed, 0 failed, 0 skipped, 0 undefined, 684 untested Took 0m0.000s

bones0 commented 4 months ago

Nb: Of course llama3 claimig it did send a \ does not mean it atually did send one, but if I order it to not use any escaping it works better. Until it forgets what I said. The transformer is definitely using some text formatting like indentation, italic or bold. Probably pretty messy, at times. While fixing llama is a bit beyond my capabilities, a workaround in the server might be achievable. Firefox is already running the ui in quirks mode, which may even be part of the problem.

bones0 commented 4 months ago

I compiled llama.cpp using the instructions in https://github.com/ggerganov/llama.cpp/discussions/2142#discussioncomment-6389180

bones0 commented 4 months ago

A collegue suggested it may have something to do with the new end-tags llama3 provides. They got printed in the textarea which should not be the case. So llama.cpp probably should be updated to honor the new tags.

bones0 commented 4 months ago

Here is a verbose output. Not sure anymore if it's llama.cpp messing it up or llama3, but there are indeed a bit too many escape-characters in there. And it's all in the prompt, not in the response. Seems strange to me:

{"tid":"140666455310336","timestamp":1715764274,"level":"INFO","function":"log_server_request","line":2856,"msg":"request","remote_addr":"127.0.0.1","remote_port":35878,"status":200,"method":"POST","path":"/completion","params":{}}
{"tid":"140666455310336","timestamp":1715764274,"level":"VERB","function":"log_server_request","line":2865,"msg":"request","request":"{\"stream\":true,\"n_predict\":400,\"temperature\":0.7,\"stop\":[\"</s>\",\"Llama:\",\"User:\"],\"repeat_last_n\":256,\"repeat_penalty\":1.18,\"penalize_nl\":false,\"top_k\":40,\"top_p\":0.95,\"min_p\":0.05,\"tfs_z\":1,\"typical_p\":1,\"presence_penalty\":0,\"frequency_penalty\":0,\"mirostat\":0,\"mirostat_tau\":5,\"mirostat_eta\":0.1,\"grammar\":\"\",\"n_probs\":0,\"min_keep\":0,\"image_data\":[],\"cache_prompt\":true,\"api_key\":\"\",\"prompt\":\"This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.\\n\\nUser: Please write a poem about a vicuna\\nLlama: What a delightful request!\\n\\nHere's a poem about the majestic vicuña:\\n\\nIn Andean highlands, where air is thin,\\nA creature roams, with fleece so fine within.\\nThe vicuña, with eyes of gentle gaze,\\nWalks softly on the mountain's ancient ways.\\n\\nIts coat, like silk, in shades of cream and gray,\\nShines bright, as if kissed by dawn's first ray.\\nWith steps quiet, it traverses the land,\\nLeaving behind a trail, so soft and grand.\\n\\nIn herds they roam, these creatures rare,\\nTheir whispers echoing through the mountain air.\\nA symbol of resilience, strong and free,\\nThe vicuña thrives in harmony with nature's glee.\\n\\nOh, let us cherish this Andean delight,\\nThis treasure of the highlands, shining bright!\\nFor in its footsteps, we find our way,\\nTo respect the land, and live in balance each day.\\n\\nI hope you enjoyed this poem! Let me know if I can assist you further.<|eot_id|>\\nUser: What are the steps to solve a rubic's cube?\\nLlama: An excellent question!\\n\\nSolving a Rubik's Cube can be a fascinating puzzle-solving adventure! Here are the steps to help you conquer it:\\n\\n**Step 1: White Cross (Top Surface)**\\n\\n* Hold the cube in front of you with the white cross facing upwards.\\n* Solve the white cross on top by twisting only the top layer.\\n* Use algorithms like \\\"U\\\" (up), \\\"D\\\" (down), \\\"L\\\" (left), and \\\"R\\\" (right) to move pieces around.\\n\\n**Step 2: White Corners**\\n\\n* Once you have the white cross, focus on placing the white corners in their correct positions.\\n* Twist middle and top layers to move corner pieces into place.\\n\\n**Step 3: Yellow Cross (Middle Layer)**\\n\\n* Hold the cube so that one of the already-solved white cross edges is facing upwards.\\n* Solve the yellow cross on the middle layer by twisting only the middle and bottom layers.\\n* Use algorithms like \\\"U\\\", \\\"D\\\", \\\"L\\\", \\\"R\\\", and \\\"F\\\" to\\\"\\n\\n.\\n\\n (\\n\\n\\n\\n)\\n\\n\\n\\n*\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n)\\n\\n\\n\\n\\n]\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n)\\n\\n\\n\\n\\n\\n\\\\\\n)\\n\\n)\\n\\n)\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n)]\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n)\\n\\n\\n\\n\\n\\n\\n\\n\\n)\\n\\n\\n\\n\\n\\n)\\n\\n)\\n\\n**\\n\\n**\\n\\n)\\n\\n<|eot_id|>\\nUser: What are the steps to solve a rubic's cube?\\nLlama:\"}","response":""}

Here it went smoothly, after resetting the conversation:

{"tid":"140667677200384","timestamp":1715764999,"level":"VERB","function":"start_loop","line":513,"msg":"wait for new task"}
{"tid":"140666446917632","timestamp":1715764999,"level":"VERB","function":"operator()","line":3426,"msg":"data stream","to_send":"data: {\"content\":\"\",\"id_slot\":0,\"stop\":true,\"model\":\"/home/peter/repos/Meta-Llama-3-70B-Instruct-GGUF/Meta-Llama-3-70B-Instruct-Q4_K_M.gguf\",\"tokens_predicted\":368,\"tokens_evaluated\":59,\"generation_settings\":{\"n_ctx\":512,\"n_predict\":512,\"model\":\"/home/peter/repos/Meta-Llama-3-70B-Instruct-GGUF/Meta-Llama-3-70B-Instruct-Q4_K_M.gguf\",\"seed\":4294967295,\"temperature\":0.699999988079071,\"dynatemp_range\":0.0,\"dynatemp_exponent\":1.0,\"top_k\":40,\"top_p\":0.949999988079071,\"min_p\":0.05000000074505806,\"tfs_z\":1.0,\"typical_p\":1.0,\"repeat_last_n\":256,\"repeat_penalty\":1.1799999475479126,\"presence_penalty\":0.0,\"frequency_penalty\":0.0,\"penalty_prompt_tokens\":[],\"use_penalty_prompt_tokens\":false,\"mirostat\":0,\"mirostat_tau\":5.0,\"mirostat_eta\":0.10000000149011612,\"penalize_nl\":false,\"stop\":[\"</s>\",\"Llama:\",\"User:\"],\"n_keep\":0,\"n_discard\":0,\"ignore_eos\":false,\"stream\":true,\"logit_bias\":[],\"n_probs\":0,\"min_keep\":0,\"grammar\":\"\",\"samplers\":[\"top_k\",\"tfs_z\",\"typical_p\",\"top_p\",\"min_p\",\"temperature\"]},\"prompt\":\"This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.\\n\\nUser: What are the steps to solve a rubic's cube?\\nLlama:\",\"truncated\":false,\"stopped_eos\":true,\"stopped_word\":false,\"stopped_limit\":false,\"stopping_word\":\"\",\"tokens_cached\":426,\"timings\":{\"prompt_n\":59,\"prompt_ms\":5056.9,\"prompt_per_token_ms\":85.71016949152542,\"prompt_per_second\":11.66722695722676,\"predicted_n\":368,\"predicted_ms\":193358.472,\"predicted_per_token_ms\":525.4306304347826,\"predicted_per_second\":1.9032008072550346}}\n\n"}
{"tid":"140666446917632","timestamp":1715764999,"level":"VERB","function":"remove_waiting_task_id","line":577,"msg":"remove waiting for task id","id_task":402}
{"tid":"140666446917632","timestamp":1715764999,"level":"INFO","function":"log_server_request","line":2856,"msg":"request","remote_addr":"127.0.0.1","remote_port":36344,"status":200,"method":"POST","path":"/completion","params":{}}
{"tid":"140666446917632","timestamp":1715764999,"level":"VERB","function":"log_server_request","line":2865,"msg":"request","request":"{\"stream\":true,\"n_predict\":400,\"temperature\":0.7,\"stop\":[\"</s>\",\"Llama:\",\"User:\"],\"repeat_last_n\":256,\"repeat_penalty\":1.18,\"penalize_nl\":false,\"top_k\":40,\"top_p\":0.95,\"min_p\":0.05,\"tfs_z\":1,\"typical_p\":1,\"presence_penalty\":0,\"frequency_penalty\":0,\"mirostat\":0,\"mirostat_tau\":5,\"mirostat_eta\":0.1,\"grammar\":\"\",\"n_probs\":0,\"min_keep\":0,\"image_data\":[],\"cache_prompt\":true,\"api_key\":\"\",\"prompt\":\"This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.\\n\\nUser: What are the steps to solve a rubic's cube?\\nLlama:\"}","response":""}
{"tid":"140666446917632","timestamp":1715764999,"level":"VERB","function":"post","line":413,"msg":"new task id","new_id":771}
{"tid":"140666446917632","timestamp":1715764999,"level":"VERB","function":"remove_waiting_task_id","line":577,"msg":"remove waiting for task id","id_task":402}
bones0 commented 4 months ago

BTW: I failed to anonymise the output because:

  1. I have no chance to get everything
  2. The machine is not publicly reachable
  3. Anybody could easily guess what my username is
github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.