Prerequisites

Please answer the following questions for yourself before submitting an issue.

[x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[x] I carefully followed the README.md.
[x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

These two Python scripts should return the same vector.

from langchain.embeddings import LlamaCppEmbeddings
input_size = 5251
llama = LlamaCppEmbeddings(model_path="/home/david_manouchehri/CodeLlama-7b_ggml-model-f16.gguf", n_gpu_layers=1500, n_ctx=input_size, n_batch=input_size)
text = "I am a tomato."
results = llama.embed_documents([text])
print(results)

vs.

python3 -m llama_cpp.server --model ~/CodeLlama-7b_ggml-model-f16.gguf --n_ctx 5251 --n_batch 5251 --host 127.0.0.1 --port 52193 --n_gpu_layers 1500

import http.client

conn = http.client.HTTPConnection("127.0.0.1", "52193")
text = "I am a tomato."
payload = {
    "input": [text]
}

headers = {
    'Content-Type': "application/json",
    'Accept': "application/json"
}
import json
conn.request("POST", "/v1/embeddings", json.dumps(payload), headers)

res = conn.getresponse()
data = res.read()

decoded = data.decode("utf-8")

results_decoded = json.loads(decoded)

results = []
for result in results_decoded["data"]:
    results.append(result["embedding"])

print(results)

Current Behavior

Right now, these two scripts return two different vectors. I'm not sure why, as they should be identical.

python3 server_version.py >> server_version.json
python3 langchain_version.py >> langchain_version.json

md5sum server_version.json langchain_version.json
# 74fd5e1d73f95d7d54d6f237a1002661  server_version.json
# 7d40fa1ceb8676ad16c61f69eedcca64  langchain_version.json

I'm unsure if this is the same issue as https://github.com/ggerganov/llama.cpp/issues/3287. Still looking into it, just opening this ticket so I and others have something to reference.

abetlen / llama-cpp-python

Different embedding results between LlamaCppEmbeddings vs server? #975

Prerequisites

Expected Behavior

Current Behavior