Open iamlemec opened 5 days ago
The goal here is to get the big embedding models at the top of the MTEB leaderboard working. There are two changes:
batch.logits
attention_type
llama_contex_params
With this PR, we can get accurate results (matching HF) from at least the number 2 spot gte-Qwen2-7B-instruct. For instance, with the command:
gte-Qwen2-7B-instruct
./llama-embedding -m gte-qwen2-7b-instruct-f16.gguf -p "hello world" -ngl 99 --pooling last --at tention non-causal -c 512
@compilade cool! just rebased to master
The goal here is to get the big embedding models at the top of the MTEB leaderboard working. There are two changes:
batch.logits
is fully ignored for pooled embeddings.attention_type
tollama_contex_params
that allows for causal, non-causal, or unspecified (model default).With this PR, we can get accurate results (matching HF) from at least the number 2 spot
gte-Qwen2-7B-instruct
. For instance, with the command: