abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.96k stars 946 forks source link

Different embedding results between LlamaCppEmbeddings vs server? #975

Open Manouchehri opened 10 months ago

Manouchehri commented 10 months ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

These two Python scripts should return the same vector.

from langchain.embeddings import LlamaCppEmbeddings
input_size = 5251
llama = LlamaCppEmbeddings(model_path="/home/david_manouchehri/CodeLlama-7b_ggml-model-f16.gguf", n_gpu_layers=1500, n_ctx=input_size, n_batch=input_size)
text = "I am a tomato."
results = llama.embed_documents([text])
print(results)

vs.

python3 -m llama_cpp.server --model ~/CodeLlama-7b_ggml-model-f16.gguf --n_ctx 5251 --n_batch 5251 --host 127.0.0.1 --port 52193 --n_gpu_layers 1500
import http.client

conn = http.client.HTTPConnection("127.0.0.1", "52193")
text = "I am a tomato."
payload = {
    "input": [text]
}

headers = {
    'Content-Type': "application/json",
    'Accept': "application/json"
}
import json
conn.request("POST", "/v1/embeddings", json.dumps(payload), headers)

res = conn.getresponse()
data = res.read()

decoded = data.decode("utf-8")

results_decoded = json.loads(decoded)

results = []
for result in results_decoded["data"]:
    results.append(result["embedding"])

print(results)

Current Behavior

Right now, these two scripts return two different vectors. I'm not sure why, as they should be identical.

python3 server_version.py >> server_version.json
python3 langchain_version.py >> langchain_version.json
md5sum server_version.json langchain_version.json
# 74fd5e1d73f95d7d54d6f237a1002661  server_version.json
# 7d40fa1ceb8676ad16c61f69eedcca64  langchain_version.json

I'm unsure if this is the same issue as https://github.com/ggerganov/llama.cpp/issues/3287. Still looking into it, just opening this ticket so I and others have something to reference.

abetlen commented 10 months ago

@Manouchehri can you try setting the seed parameter in both?