inconsistencies with gemma RAG with langchain

hemanth commented 3 months ago

Trying RAG with gemma and langchain:

# Load and split documents
loader = WebBaseLoader("https://h3manth.com")
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

# Create vector store
vectorstore = FAISS.from_documents(documents=all_splits, embedding=HuggingFaceEmbeddings())

# Load RAG prompt
prompt = hub.pull("rlm/rag-prompt")

# Create LLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=1000,
    temperature=0.1,
    top_p=0.95,
    repetition_penalty=1.15,
    do_sample=True,
)
llm = HuggingFacePipeline(pipeline=pipe)

# Create RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm, 
    retriever=vectorstore.as_retriever(), 
    chain_type="stuff",  # Specify chain type
    chain_type_kwargs={"prompt": prompt}
)

# Ask question
question = "List the References"
response = qa_chain({"query": question})

print(response["result"])

Doesn't result in any content or some random texts at time, is there something missing in the pipeline?

notebook for the same.

hemanth commented 3 months ago

Fixed it! It need bit more prompt engineering :)

ajpalec commented 3 months ago

Does your pipeline object return text when using the gemma models? I'm not sure what the issue is for me, but i'm having trouble for whatever reason even though the tokenizer and model seem to be giving the right outputs.

google-deepmind / gemma

inconsistencies with gemma RAG with langchain #21