Answer-is-not-42 commented 12 months ago

Environment and Context

Windows 11, latest release of llamacpp, cu12.2

Failure Information

When using server web interface with multimodal model in multimodal mode (LLaVa 1.5 13B in my case), the promt template in the interface is not used. This does not happen when I don't send an image.

Also, prompts with images don't seem to be saving in cache? I'm not sure if it's related, or should I create another issue?

Steps to Reproduce

Launch server ./server -m ./models/llava-13b-q4_K.gguf --mmproj ./models/mmproj-llava-13b-f16.gguf -ngl 20 -v
Change prompt, prompt template, chat history template. In my case I deleted the prompt
Upload an image
Type something and hit send
See console outputting the default prompt

Failure Logs

Output from console when using multimodal capabilities (sending an image):

{"timestamp":1699022762,"level":"VERBOSE","function":"log_server_request","line":2222,"message":"request","request":"{\"stream\":true,\"n_predict\":400,\"temperature\":0.6,\"stop\":[\"</s>\",\"LLaVa:\",\"User:\"],\"repeat_last_n\":256,\"repeat_penalty\":1.13,\"top_k\":40,\"top_p\":1,\"tfs_z\":1,\"typical_p\":1,\"presence_penalty\":0,\"frequency_penalty\":0,\"mirostat\":0,\"mirostat_tau\":5,\"mirostat_eta\":0.1,\"grammar\":\"\",\"n_probs\":0,\"image_data\":[{\"data\":\"I left out the image data",\"id\":10}],\"cache_prompt\":true,\"slot_id\":0,\"prompt\":\"A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\\nUSER:[img-10]Describe the image for me.\\nASSISTANT:\"}","response":""}
slot 0 released (135 tokens in cache)

The defaut prompt is there. (Also, if I run the same prompt with the same image the image and prompt get reprocessed again.) Now, without sending an image, just asking the model to say "test":

{"timestamp":1699025999,"level":"VERBOSE","function":"log_server_request","line":2222,"message":"request","request":"{\"stream\":true,\"n_predict\":400,\"temperature\":0.6,\"stop\":[\"</s>\",\"LLaVa:\",\"User:\"],\"repeat_last_n\":256,\"repeat_penalty\":1.13,\"top_k\":40,\"top_p\":1,\"tfs_z\":1,\"typical_p\":1,\"presence_penalty\":0,\"frequency_penalty\":0,\"mirostat\":0,\"mirostat_tau\":5,\"mirostat_eta\":0.1,\"grammar\":\"\",\"n_probs\":0,\"image_data\":[],\"cache_prompt\":true,\"slot_id\":0,\"prompt\":\"USER: Say \\\"test\\\"\\nASSISTANT: \"}","response":""}
slot 0 released (18 tokens in cache)

No default prompt.

stduhpf commented 11 months ago

I'm having the same issue, and I think it's because the image embeddings take the place of the system prompt or something. Also I'm not sure if it is directly related, but the image gets re-encoded and added to context everytime the user sends any message, without uploading the image again. I can't engage in a "conversation" about the image, only get the description once.

m-a-sch commented 11 months ago

see #4034 and my suggestion to fix it (works for me)

github-actions[bot] commented 7 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

ggerganov / llama.cpp

Changing prompt(template) in the server web interface does not affect generation when using multimodal mode #3934

Environment and Context

Failure Information

Steps to Reproduce

Failure Logs