abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.73k stars 931 forks source link

When using LlamaCppEmbeddings to embed a gguf type model, an error is reported #910

Open chengjia604 opened 10 months ago

chengjia604 commented 10 months ago

I used the latest module and while embedding the gguf model into chroma, a critical error occurred llamaem= LlamaCppEmbeddings(model_path="D:\models\llama-2-7b-chat.Q4_K_M.gguf") vectorstore = Chroma.from_documents(documents=all_splits, embedding=llamaem)

error: File "d:/project/python/document-GPT/test.py", line 71, in <module> vectorstore = Chroma.from_documents(documents=all_splits, embedding=llamaem)#嵌入 File "D:\python\lib\site-packages\langchain\vectorstores\chroma.py", line 771, in from_documents return cls.from_texts( File "D:\python\lib\site-packages\langchain\vectorstores\chroma.py", line 729, in from_texts chroma_collection.add_texts( File "D:\python\lib\site-packages\langchain\vectorstores\chroma.py", line 275, in add_texts embeddings = self._embedding_function.embed_documents(texts) File "D:\python\lib\site-packages\langchain\embeddings\llamacpp.py", line 113, in embed_documents embeddings = [self.client.embed(text) for text in texts] File "D:\python\lib\site-packages\langchain\embeddings\llamacpp.py", line 113, in <listcomp> embeddings = [self.client.embed(text) for text in texts] File "D:\python\lib\site-packages\llama_cpp\llama.py", line 1292, in embed return list(map(float, self.create_embedding(input)["data"][0]["embedding"])) File "D:\python\lib\site-packages\llama_cpp\llama.py", line 1256, in create_embedding self.eval(tokens) File "D:\python\lib\site-packages\llama_cpp\llama.py", line 1030, in eval self._ctx.decode(self._batch) File "D:\python\lib\site-packages\llama_cpp\llama.py", line 471, in decode raise RuntimeError(f"llama_decode returned {return_code}") RuntimeError: llama_decode returned 1

bsridatta commented 10 months ago

I am also running into this, there are other issues related to this as well, indicating memory leak etc. I hope someone can look into this

lithces commented 10 months ago

I can confirm this. I am using vicuna-7b-16k gguf Q5KM, there are still GPU VRAMs left. It seems that it is triggered with long input + embeddings.

abetlen commented 10 months ago

Sorry to get to this so late, I'll take a look!

shashankshekhardehradun commented 10 months ago

What type of split are you performing on the document? I was also getting the same error but changing the splitter to RecursiveCharacterTextSplitter solved the issue.

bsridatta commented 10 months ago

This is how I was trying and failed

from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("example_data/layout-parser-paper.pdf")
pages = loader.load_and_split()
EssamMohamedAbo-ElMkarem commented 9 months ago

Still getting the same error. I'm using CharacterTextSplitter

shashankshekhardehradun commented 9 months ago

Can you try changing it to RecursiveCharacterTextSplitter?

GluttonousCat commented 9 months ago

i run with the same problem in the embedding, but i fix it by modify the param n_ctx=4096. It seems the long text cause the #problem

llama = Llama(model_path='./llama-2-7b.Q4_K_M.gguf', embedding=True, n_ctx=4096, n_gpu_layers=30)
deekshith-rj commented 9 months ago

I think Chromadb doesn't support LlamaCppEmbeddings feature of Langchain. Checkout the embeddings integrations it supports in the below link. Apparently, we need to create a custom EmbeddingFunction class (also shown in the below link) to use unsupported embeddings APIs.

https://docs.trychroma.com/embeddings#custom-embedding-functions

enzolutions commented 3 months ago

The link provided by @deekshith-rj now return a 404, I think the new page is https://docs.trychroma.com/guides/embeddings#custom-embedding-functions