Open chengjia604 opened 1 year ago
I am also running into this, there are other issues related to this as well, indicating memory leak etc. I hope someone can look into this
I can confirm this. I am using vicuna-7b-16k gguf Q5KM, there are still GPU VRAMs left. It seems that it is triggered with long input + embeddings.
Sorry to get to this so late, I'll take a look!
What type of split are you performing on the document? I was also getting the same error but changing the splitter to RecursiveCharacterTextSplitter solved the issue.
This is how I was trying and failed
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("example_data/layout-parser-paper.pdf")
pages = loader.load_and_split()
Still getting the same error. I'm using CharacterTextSplitter
Can you try changing it to RecursiveCharacterTextSplitter?
i run with the same problem in the embedding, but i fix it by modify the param n_ctx=4096. It seems the long text cause the #problem
llama = Llama(model_path='./llama-2-7b.Q4_K_M.gguf', embedding=True, n_ctx=4096, n_gpu_layers=30)
I think Chromadb doesn't support LlamaCppEmbeddings feature of Langchain. Checkout the embeddings integrations it supports in the below link. Apparently, we need to create a custom EmbeddingFunction class (also shown in the below link) to use unsupported embeddings APIs.
https://docs.trychroma.com/embeddings#custom-embedding-functions
The link provided by @deekshith-rj now return a 404, I think the new page is https://docs.trychroma.com/guides/embeddings#custom-embedding-functions
I used the latest module and while embedding the gguf model into chroma, a critical error occurred
llamaem= LlamaCppEmbeddings(model_path="D:\models\llama-2-7b-chat.Q4_K_M.gguf") vectorstore = Chroma.from_documents(documents=all_splits, embedding=llamaem)
error:
File "d:/project/python/document-GPT/test.py", line 71, in <module> vectorstore = Chroma.from_documents(documents=all_splits, embedding=llamaem)#嵌入 File "D:\python\lib\site-packages\langchain\vectorstores\chroma.py", line 771, in from_documents return cls.from_texts( File "D:\python\lib\site-packages\langchain\vectorstores\chroma.py", line 729, in from_texts chroma_collection.add_texts( File "D:\python\lib\site-packages\langchain\vectorstores\chroma.py", line 275, in add_texts embeddings = self._embedding_function.embed_documents(texts) File "D:\python\lib\site-packages\langchain\embeddings\llamacpp.py", line 113, in embed_documents embeddings = [self.client.embed(text) for text in texts] File "D:\python\lib\site-packages\langchain\embeddings\llamacpp.py", line 113, in <listcomp> embeddings = [self.client.embed(text) for text in texts] File "D:\python\lib\site-packages\llama_cpp\llama.py", line 1292, in embed return list(map(float, self.create_embedding(input)["data"][0]["embedding"])) File "D:\python\lib\site-packages\llama_cpp\llama.py", line 1256, in create_embedding self.eval(tokens) File "D:\python\lib\site-packages\llama_cpp\llama.py", line 1030, in eval self._ctx.decode(self._batch) File "D:\python\lib\site-packages\llama_cpp\llama.py", line 471, in decode raise RuntimeError(f"llama_decode returned {return_code}") RuntimeError: llama_decode returned 1