ggerganov / llama.cpp

LLM inference in C/C++
MIT License
60.95k stars 8.7k forks source link

Streamline embeddings from "non-embedding" models #8087

Open iamlemec opened 5 days ago

iamlemec commented 5 days ago

The goal here is to get the big embedding models at the top of the MTEB leaderboard working. There are two changes:

With this PR, we can get accurate results (matching HF) from at least the number 2 spot gte-Qwen2-7B-instruct. For instance, with the command:

./llama-embedding -m gte-qwen2-7b-instruct-f16.gguf -p "hello world" -ngl 99 --pooling last --at
tention non-causal -c 512


iamlemec commented 1 day ago

@compilade cool! just rebased to master