How to use local large models and embedded models to generate test datasets for ragas0.2?

explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀

Apache License 2.0

6.93k stars 696 forks source link

I found some relevant discussions and solutions for using local large models and embedding models to generate test datasets in Ragas.

Using Local LLMs: To use the Ragas package without relying on OpenAI, you can wrap your local LLM using LangchainLLMWrapper. Ensure that the default OpenAI embeddings in the transforms are replaced. Here's a brief example:

from langchain_ollama import ChatOllama
from ragas.llms import LangchainLLMWrapper
from ragas.testset import TestsetGenerator

# Initialize your local LLM
generator_llm = LangchainLLMWrapper(ChatOllama(model="gemma2:2b"))

# Initialize the TestsetGenerator with your local LLM
generator = TestsetGenerator(llm=generator_llm)

# Generate the testset
dataset = generator.generate_with_langchain_docs(docs, test_size=10)

This setup ensures that your custom transformations are applied, and the default OpenAI embeddings are not used ^[1].

TestsetGenerator with Custom LLMs: It is possible to create a new TestsetGenerator with any LangchainLLM. You can check the code here to achieve this. However, be aware that this feature might change in the future as testset generation is currently a beta feature ^[2].

These solutions should help you integrate local large models and embedding models into your Ragas test dataset generation process.

To continue talking to Dosu, mention @dosu.

from langchain_ollama.embeddings import OllamaEmbeddings from ragas import evaluate from ragas.metrics import answer_relevancy from ragas.llms import LangchainLLMWrapper import json import os from datasets import Dataset questions = ["Question?",] ground_truths = [["I'm telling the Truth"]] answers = ['Awesome Anwser'] contexts = [['My Context']] # To dict data = { "question": questions, "answer": answers, "contexts": contexts, "ground_truths": ground_truths } # Convert dict to dataset dataset = Dataset.from_dict(data) os.environ['OPENAI_API_KEY'] = 'no-key' # The model should be specified using the `model` parameter req_llm = ChatOllama(model="llama3.2:1b") wrapper = LangchainLLMWrapper(req_llm) embeddings = OllamaEmbeddings(model="llama3.2:1b") metrics=[answer_relevancy] for m in metrics: m.__setattr__("llm", wrapper) if hasattr(m, "embeddings"): m.__setattr__("embeddings", embeddings) # Step 3: Run the evaluation results = evaluate( dataset=dataset, metrics=[answer_relevancy], llm=wrapper, embeddings=embeddings, ) # Step 4: Print the results print(results)

explodinggradients / ragas

How to use local large models and embedded models to generate test datasets for ragas0.2? #1497