server: multimodal - fix misreported prompt and num prompt tokens

Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.

https://llamafile.ai

Other

19.39k stars 982 forks source link

server: multimodal - fix misreported prompt and num prompt tokens #392

Closed cjpais closed 4 months ago

cjpais commented 5 months ago

Ported the code from llama.cpp PR 5896

Should address llama.cpp 5852 and llama.cpp 5863

To fix, we set the number of tokens processed to it's correct value in ingest_images where the prompt is tokenized for multimodal.

Additionally a fix for the prompt being set to the empty string for multimodal responses. Basically we iteratively rebuild the initial prompt since it was cleared.