Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.
https://llamafile.ai
Other
16.75k stars 830 forks source link

Are embeddings not supported with the mistral-7b-instruct-v0.2 model? #414

Closed norteo closed 1 month ago

norteo commented 1 month ago

I run llamafile with the mistral model as:

./mistral-7b-instruct-v0.2.Q5_K_M.llamafile -ngl 9999 --port 8080 --host 0.0.0.0 --embedding --threads 16

I don't have a GPU.

If I run

curl http://localhost:8080/embedding \
        -H "Authorization: Bearer no-key" \
        -H "Content-Type: application/json" \
        -d '{ "content": "The food was delicious and the waiter..." }'

llamafile "crashes" with the message:

{"function":"launch_slot_with_data","level":"INFO","line":875,"msg":"slot is processing task","slot_id":0,"task_id":0,"tid":"9434528","timestamp":1715505285}
{"function":"update_slots","level":"INFO","line":1890,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":0,"tid":"9434528","timestamp":1715505285}
llama_get_embeddings_ith: invalid embeddings id 0, reason: batch.logits[0] != true
GGML_ASSERT: llama.cpp/llama.cpp:16631: false
lovenemesis commented 1 month ago

If you don't have an GPU, you probably don't need to parse -ngl 9999.

k8si commented 1 month ago

This should only be an issue in older versions of llamafile. What version of llamafile is this llamafile associated with? To find out, you can run

./mistral-7b-instruct-v0.2.Q5_K_M.llamafile --version
norteo commented 1 month ago

Thank you for the reply.

user@fe9e8ccdc306:~$ ./mistral-7b-instruct-v0.2.Q5_K_M.llamafile --version
llamafile v0.8.0

It seems I was not using the latest version. I redownloaded the file and I rerun the curl command and it seems to work fine. The version I am using now is 0.8.5 . I just downloaded it from https://huggingface.co/Mozilla/Mistral-7B-Instruct-v0.2-llamafile/resolve/main/mistral-7b-instruct-v0.2.Q5_K_M.llamafile?download=true The hugging face repository does not seem to have the latest version and it seems it is not possible to know what llamafile version you are downloading. Maybe something should be done about that?

k8si commented 1 month ago

The hugging face repository does not seem to have the latest version and it seems it is not possible to know what llamafile version you are downloading. Maybe something should be done about that?

Would you be willing to post this as a separate issue?