aigeek0x0 / zephyr-7b-alpha-langchain-chatbot

Chat with PDF using Zephyr 7B Alpha, Langchain, ChromaDB, and Gradio with Free Google Colab
Apache License 2.0
136 stars 11 forks source link

A sort of...."hallucination"? Help, please! #1

Closed MatteoRiva95 closed 7 months ago

MatteoRiva95 commented 7 months ago

Hello everyone,

I am Matteo and I am currently working on an AI project where the idea is to give to a large language model thousands of english PDFs (around 100k, all about the same topic) and then to be able to chat with it. I followed the Colab notebook step by step, but unfortunately, when I ask something to the model, it gives a sort of "hallucination" :( I mean, it gives some correct information, but also erroneous one (for example, the title of the PDF is correct, but it gives erroneous years or URLs)! I do not really understand what is going on! Too much information for the model (for testing, I am just using 500 PDFs for now)? Chunk size is not good (I am using chunk_size=1000, chunk_overlap=0)? What am I missing? Apologies, I am a beginner so maybe I am making some mistakes...

I also tried to add prompt template...

from langchain_core.prompts import PromptTemplate

template = """You are a chatbot tasked with responding to questions about the documentation. You should never answer a question with a question, and you should always respond with the most relevant documentation page. Do not answer questions that are not about the documentation. If the question is not about the documentation, politely inform them that you are tuned to only answer questions about the documentation. Read the provided context and reply to the user question. If you don't know the answer from the context, reply only "\ I don't know"\ and nothing else. Don't try to make up an answer. If you know the answer from the context, always give detailed reference to your answer. Given a question, you should formally respond with the most relevant documentation page by following the relevant context. Question: {question}

{context} =========""" QA_PROMPT = PromptTemplate(template=template, input_variables=[ "question", "context"])

...to the ConversationalRetrievalChain.from_llm function:

qa_chain = ConversationalRetrievalChain.from_llm( llm=llm, retriever=retriever, memory=memory, get_chat_history=lambda h: h, combine_docs_chain_kwargs={"prompt": QA_PROMPT} )

But nothing has changed at all! :( Finally, I take the opportunity to ask if you know a method to save the model after RAG, in order to not repeat the procedure every time.

Can someone help me, please? I am pretty desperate and I do not know how to properly solve it. Any help would be really appreciated! Thank you so much in advance!

Matteo

aigeek0x0 commented 7 months ago

Hallucinations are a common issue with LLMs in general, and smaller models, in particular. If you're experiencing this problem, it's essential to assess your retrieval quality.

I suggest utilizing Langsmith by Langchain to examine the context that is being retrieved. This notebook relies on Naive RAG techniques, but you might want to explore the use of Advanced RAG techniques, incorporating Cross Encoder Re-ranking and Hybrid Search, to enhance retrieval quality.

Considering the substantial size of your text corpus, incorporating Metadata filtering is also advisable.

MatteoRiva95 commented 7 months ago

@aigeek0x0 Thank you so much for your reply! Lovely! Just a few questions, then I won't bother you anymore: I tested RAG (HuggingFace + Zephyr 7B Alpha) with 60k PDFs, but it was really really slow. It took 10 minutes to reply to only one question! :(

Do you think RAG is the best solution for my project? Maybe Metadata filtering could fix this issue? Or maybe fine-tuning? Please, can you give me your suggestions for these last questions? I would really appreciate!

Thank you again and have a nice day!

aigeek0x0 commented 7 months ago

@aigeek0x0 Thank you so much for your reply! Lovely! Just a few questions, then I won't bother you anymore: I tested RAG (HuggingFace + Zephyr 7B Alpha) with 60k PDFs, but it was really really slow. It took 10 minutes to reply to only one question! :(

Do you think RAG is the best solution for my project? Maybe Metadata filtering could fix this issue? Or maybe fine-tuning? Please, can you give me your suggestions for these last questions? I would really appreciate!

Thank you again and have a nice day!

To be honest, I haven't worked on any projects that involved employing semantic search across a dataset of over 60k PDFs for each individual query.

I recommend utilizing metadata filtering initially to narrow down the PDFs based on predefined criteria. Subsequently, you can perform semantic searches on a subset of the filtered PDFs.

If you find that the response time is still prolonged, consider exploring managed services like Pinecone or Weaviate for more efficient processing.