[ ] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question
How can I evaluate my RAG (Retrieval-Augmented Generation) built with Transformers and using embeddings from Hugging Face? In the documentation, I see how to evaluate with your own LLM and your embeddings, as well as how to integrate RAGs with Chain, but not how to do both at the same time. I don't want to use the OpenAI API.
[ ] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question How can I evaluate my RAG (Retrieval-Augmented Generation) built with Transformers and using embeddings from Hugging Face? In the documentation, I see how to evaluate with your own LLM and your embeddings, as well as how to integrate RAGs with Chain, but not how to do both at the same time. I don't want to use the OpenAI API.
Code Examples import transformers, torch from transformers import pipeline from langchain import HuggingFacePipeline generate_text = pipeline( model=model, tokenizer=tokenizer, return_full_text=True, task='text-generation', do_sample=True, top_p=0.98, temperature=0.4, max_new_tokens=512, repetition_penalty=1.1
)
llm = HuggingFacePipeline(pipeline=generate_text) qa = RetrievalQA.from_chain_type( llm=llm,chain_type='stuff', retriever=vectorstore.as_retriever(search_kwargs={"k":2}) )
embed_model_id = "distiluse-base-multilingual-cased-v1"
device = f"cuda:{cuda.current_device()}" if cuda.is_available() else "cpu"
embed_model = HuggingFaceEmbeddings( model_name=embed_model_id, model_kwargs={"device": device}, encode_kwargs={"device": device, "batch_size": 32} )