Closed bones0 closed 2 months ago
Nb: Of course llama3 claimig it did send a \ does not mean it atually did send one, but if I order it to not use any escaping it works better. Until it forgets what I said. The transformer is definitely using some text formatting like indentation, italic or bold. Probably pretty messy, at times. While fixing llama is a bit beyond my capabilities, a workaround in the server might be achievable. Firefox is already running the ui in quirks mode, which may even be part of the problem.
I compiled llama.cpp using the instructions in https://github.com/ggerganov/llama.cpp/discussions/2142#discussioncomment-6389180
A collegue suggested it may have something to do with the new end-tags llama3 provides. They got printed in the textarea which should not be the case. So llama.cpp probably should be updated to honor the new tags.
Here is a verbose output. Not sure anymore if it's llama.cpp messing it up or llama3, but there are indeed a bit too many escape-characters in there. And it's all in the prompt, not in the response. Seems strange to me:
{"tid":"140666455310336","timestamp":1715764274,"level":"INFO","function":"log_server_request","line":2856,"msg":"request","remote_addr":"127.0.0.1","remote_port":35878,"status":200,"method":"POST","path":"/completion","params":{}}
{"tid":"140666455310336","timestamp":1715764274,"level":"VERB","function":"log_server_request","line":2865,"msg":"request","request":"{\"stream\":true,\"n_predict\":400,\"temperature\":0.7,\"stop\":[\"</s>\",\"Llama:\",\"User:\"],\"repeat_last_n\":256,\"repeat_penalty\":1.18,\"penalize_nl\":false,\"top_k\":40,\"top_p\":0.95,\"min_p\":0.05,\"tfs_z\":1,\"typical_p\":1,\"presence_penalty\":0,\"frequency_penalty\":0,\"mirostat\":0,\"mirostat_tau\":5,\"mirostat_eta\":0.1,\"grammar\":\"\",\"n_probs\":0,\"min_keep\":0,\"image_data\":[],\"cache_prompt\":true,\"api_key\":\"\",\"prompt\":\"This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.\\n\\nUser: Please write a poem about a vicuna\\nLlama: What a delightful request!\\n\\nHere's a poem about the majestic vicuña:\\n\\nIn Andean highlands, where air is thin,\\nA creature roams, with fleece so fine within.\\nThe vicuña, with eyes of gentle gaze,\\nWalks softly on the mountain's ancient ways.\\n\\nIts coat, like silk, in shades of cream and gray,\\nShines bright, as if kissed by dawn's first ray.\\nWith steps quiet, it traverses the land,\\nLeaving behind a trail, so soft and grand.\\n\\nIn herds they roam, these creatures rare,\\nTheir whispers echoing through the mountain air.\\nA symbol of resilience, strong and free,\\nThe vicuña thrives in harmony with nature's glee.\\n\\nOh, let us cherish this Andean delight,\\nThis treasure of the highlands, shining bright!\\nFor in its footsteps, we find our way,\\nTo respect the land, and live in balance each day.\\n\\nI hope you enjoyed this poem! Let me know if I can assist you further.<|eot_id|>\\nUser: What are the steps to solve a rubic's cube?\\nLlama: An excellent question!\\n\\nSolving a Rubik's Cube can be a fascinating puzzle-solving adventure! Here are the steps to help you conquer it:\\n\\n**Step 1: White Cross (Top Surface)**\\n\\n* Hold the cube in front of you with the white cross facing upwards.\\n* Solve the white cross on top by twisting only the top layer.\\n* Use algorithms like \\\"U\\\" (up), \\\"D\\\" (down), \\\"L\\\" (left), and \\\"R\\\" (right) to move pieces around.\\n\\n**Step 2: White Corners**\\n\\n* Once you have the white cross, focus on placing the white corners in their correct positions.\\n* Twist middle and top layers to move corner pieces into place.\\n\\n**Step 3: Yellow Cross (Middle Layer)**\\n\\n* Hold the cube so that one of the already-solved white cross edges is facing upwards.\\n* Solve the yellow cross on the middle layer by twisting only the middle and bottom layers.\\n* Use algorithms like \\\"U\\\", \\\"D\\\", \\\"L\\\", \\\"R\\\", and \\\"F\\\" to\\\"\\n\\n.\\n\\n (\\n\\n\\n\\n)\\n\\n\\n\\n*\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n)\\n\\n\\n\\n\\n]\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n)\\n\\n\\n\\n\\n\\n\\\\\\n)\\n\\n)\\n\\n)\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n)]\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n)\\n\\n\\n\\n\\n\\n\\n\\n\\n)\\n\\n\\n\\n\\n\\n)\\n\\n)\\n\\n**\\n\\n**\\n\\n)\\n\\n<|eot_id|>\\nUser: What are the steps to solve a rubic's cube?\\nLlama:\"}","response":""}
Here it went smoothly, after resetting the conversation:
{"tid":"140667677200384","timestamp":1715764999,"level":"VERB","function":"start_loop","line":513,"msg":"wait for new task"}
{"tid":"140666446917632","timestamp":1715764999,"level":"VERB","function":"operator()","line":3426,"msg":"data stream","to_send":"data: {\"content\":\"\",\"id_slot\":0,\"stop\":true,\"model\":\"/home/peter/repos/Meta-Llama-3-70B-Instruct-GGUF/Meta-Llama-3-70B-Instruct-Q4_K_M.gguf\",\"tokens_predicted\":368,\"tokens_evaluated\":59,\"generation_settings\":{\"n_ctx\":512,\"n_predict\":512,\"model\":\"/home/peter/repos/Meta-Llama-3-70B-Instruct-GGUF/Meta-Llama-3-70B-Instruct-Q4_K_M.gguf\",\"seed\":4294967295,\"temperature\":0.699999988079071,\"dynatemp_range\":0.0,\"dynatemp_exponent\":1.0,\"top_k\":40,\"top_p\":0.949999988079071,\"min_p\":0.05000000074505806,\"tfs_z\":1.0,\"typical_p\":1.0,\"repeat_last_n\":256,\"repeat_penalty\":1.1799999475479126,\"presence_penalty\":0.0,\"frequency_penalty\":0.0,\"penalty_prompt_tokens\":[],\"use_penalty_prompt_tokens\":false,\"mirostat\":0,\"mirostat_tau\":5.0,\"mirostat_eta\":0.10000000149011612,\"penalize_nl\":false,\"stop\":[\"</s>\",\"Llama:\",\"User:\"],\"n_keep\":0,\"n_discard\":0,\"ignore_eos\":false,\"stream\":true,\"logit_bias\":[],\"n_probs\":0,\"min_keep\":0,\"grammar\":\"\",\"samplers\":[\"top_k\",\"tfs_z\",\"typical_p\",\"top_p\",\"min_p\",\"temperature\"]},\"prompt\":\"This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.\\n\\nUser: What are the steps to solve a rubic's cube?\\nLlama:\",\"truncated\":false,\"stopped_eos\":true,\"stopped_word\":false,\"stopped_limit\":false,\"stopping_word\":\"\",\"tokens_cached\":426,\"timings\":{\"prompt_n\":59,\"prompt_ms\":5056.9,\"prompt_per_token_ms\":85.71016949152542,\"prompt_per_second\":11.66722695722676,\"predicted_n\":368,\"predicted_ms\":193358.472,\"predicted_per_token_ms\":525.4306304347826,\"predicted_per_second\":1.9032008072550346}}\n\n"}
{"tid":"140666446917632","timestamp":1715764999,"level":"VERB","function":"remove_waiting_task_id","line":577,"msg":"remove waiting for task id","id_task":402}
{"tid":"140666446917632","timestamp":1715764999,"level":"INFO","function":"log_server_request","line":2856,"msg":"request","remote_addr":"127.0.0.1","remote_port":36344,"status":200,"method":"POST","path":"/completion","params":{}}
{"tid":"140666446917632","timestamp":1715764999,"level":"VERB","function":"log_server_request","line":2865,"msg":"request","request":"{\"stream\":true,\"n_predict\":400,\"temperature\":0.7,\"stop\":[\"</s>\",\"Llama:\",\"User:\"],\"repeat_last_n\":256,\"repeat_penalty\":1.18,\"penalize_nl\":false,\"top_k\":40,\"top_p\":0.95,\"min_p\":0.05,\"tfs_z\":1,\"typical_p\":1,\"presence_penalty\":0,\"frequency_penalty\":0,\"mirostat\":0,\"mirostat_tau\":5,\"mirostat_eta\":0.1,\"grammar\":\"\",\"n_probs\":0,\"min_keep\":0,\"image_data\":[],\"cache_prompt\":true,\"api_key\":\"\",\"prompt\":\"This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.\\n\\nUser: What are the steps to solve a rubic's cube?\\nLlama:\"}","response":""}
{"tid":"140666446917632","timestamp":1715764999,"level":"VERB","function":"post","line":413,"msg":"new task id","new_id":771}
{"tid":"140666446917632","timestamp":1715764999,"level":"VERB","function":"remove_waiting_task_id","line":577,"msg":"remove waiting for task id","id_task":402}
BTW: I failed to anonymise the output because:
This issue was closed because it has been inactive for 14 days since being marked as stale.
I discussed matters with llama-3-70B-GGUF directly in the server GUI. I am not sure how I could reproduce that in the server test scenario framework so you may forgive me if I send you the output of the conversation. It is, after all, not a uncommon problem. Multiple escaping happens quite often. In the conversation you can find my theory of what's going on and also the point when the output gets garbled since llama-3 has a tendency to forget that it promised to not use escaping a few lines up:
"Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug."
I got quite some truckload of information about my system here:
"If the bug concerns the server, please try to reproduce it first using the server test scenario framework."
Pretty much failed to find out how I should reproduce this...
/llama.cpp/examples/server/tests$ ./tests.sh @llama.cpp @embeddings Feature: llama.cpp server # features/embeddings.feature:3 Starting new scenario: Embedding! server is listening on localhost:8080... HOOK-ERROR in before_scenario: AssertionError: Server already started
Background: Server startup # features/embeddings.feature:5
Scenario: Embedding # features/embeddings.feature:19 Given a server listening on localhost:8080 # None And a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf # None And a model file ggml-model-f16.gguf # None And a model alias bert-bge-small # None And 42 as server seed # None And 2 slots # None And 1024 as batch size # None And 1024 as ubatch size # None And 2048 KV cache size # None And embeddings extraction # None Then the server is starting # None Then the server is healthy # None When embeddings are computed for # None """ What is the capital of Bulgaria ? """ Then embeddings are generated # None
Failing scenarios: features/embeddings.feature:19 Embedding
2 features passed, 2 failed, 0 skipped, 2 untested 0 scenarios passed, 1 failed, 0 skipped, 42 untested 0 steps passed, 0 failed, 0 skipped, 0 undefined, 684 untested Took 0m0.000s