Open amin-kh96 opened 1 month ago
Yes, it is mandatory to set an API key to evaluate using the evaluate
function in RAGAS. The document states:
"Many metrics use OpenAI models and require an environment variable OPENAI_API_KEY
to be set."
However, there is an alternative mentioned in the document. You can use the Opik platform, which provides a hosted version and also allows for local setup. This means you can run the Opik platform locally without needing an OpenAI API key. For more information on setting up Opik locally, refer to the installation guide.
Additionally, you can use Langfuse for online evaluation of your RAG pipelines. Langfuse allows you to score your traces and spans without needing ground-truth data, making it suitable for production environments. You can set up Langfuse with your own keys and use it to perform model-based evaluations. For more details, refer to the Langfuse documentation [1][2][3].
To continue talking to Dosu, mention @dosu.
but I am confused when I calculated the embeddings and also I did not set the LLM why do I have to use the API key? it is for embedding.
You need to use an API key for embedding calculations because the MetricWithEmbeddings
class requires valid embeddings to be provided. The init
method of this class checks if the embeddings are valid and also verifies the API key for services like OpenAI and AzureOpenAI. If no valid embeddings are provided, it raises a ValueError
.
@dataclass
class MetricWithEmbeddings(Metric):
embeddings: t.Optional[BaseRagasEmbeddings] = None
def init(self, run_config: RunConfig):
"""
Init any models in the metric, this is invoked before evaluate()
to load all the models
Also check if the api key is valid for OpenAI and AzureOpenAI
"""
if self.embeddings is None:
raise ValueError(
f"Metric '{self.name}' has no valid embeddings provided (self.embeddings is None). Please initantiate a the metric with an embeddings to run." # noqa
)
self.embeddings.set_run_config(run_config)
This means that even if you did not set the LLM, you still need to provide valid embeddings and an API key for the embedding calculations to proceed. Ragas uses LLMs and Embeddings for both evaluation and test set generation, and by default, it uses OpenAI models. If you are using custom embeddings, they should be a subclass of BaseRagasEmbeddings
, and you need to ensure that the API key is valid for the service you are using [1][2].
To continue talking to Dosu, mention @dosu.
I have another question, I have a vector database where PDFs and other kinds of documents are stored there. but they are split into chunks and they are vectorized. these files are very specialized for example chunks are explaining a specific error that is related to a specific medical device, should I use an LLM for ground truth in this case? or I should create a ground truth dataset? but even if I create a ground truth dataset I have to use the PDFs or documents that I have in my vector database and literally they are the same, help me with this
You should create a ground truth dataset when your vector database contains specialized documents split into chunks and vectorized, such as those explaining specific errors related to medical devices. This ensures accurate and reliable evaluations, especially in specialized domains like medical devices [1].
The ground truth dataset should include the following columns:
list[list]
since each question can retrieve multiple text chunks.list[str]
which corresponds to the expected answer for each question.Here is an example of how to structure your dataset:
from datasets import Dataset
data_samples = {
'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
'answer': ['The first superbowl was held on January 15, 1967', 'The most super bowls have been won by The New England Patriots'],
'contexts' : [['The Super Bowl....season since 1966,','replacing the NFL...in February.'],
['The Green Bay Packers...Green Bay, Wisconsin.','The Packers compete...Football Conference']],
'ground_truth': [['The first superbowl was held on January 15, 1967'], ['The New England Patriots have won the Super Bowl a record six times']]
}
dataset = Dataset.from_dict(data_samples)
The dataset should have the following structure:
Dataset[question: list[str], contexts: list[list[str]], answer: list[str], ground_truth: list[list[str]]]
This format ensures that the metrics can properly evaluate the RAG pipeline [2][3][4].
To continue talking to Dosu, mention @dosu.
no you are noticed my question. lets assume that I have dataset for rag evaluation, but I do not have the ground truth, how can I create or use GT
If you do not have an existing ground truth (GT) dataset for RAG evaluation, you can use Ragas for reference-free evaluations. Ragas can perform evaluations on various aspects of your RAG pipeline without needing ground-truth data. This allows you to run evaluations on production traces collected with Langfuse.
Here is a brief overview of how you can set up and use Ragas for this purpose:
Install Required Packages:
%pip install datasets ragas llama_index python-dotenv --upgrade
Load Your Dataset:
from datasets import load_dataset
amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")["eval"]
amnesty_qa
Define Metrics:
from ragas.metrics import faithfulness, answer_relevancy, context_precision
from ragas.metrics.critique import SUPPORTED_ASPECTS, harmfulness
metrics = [faithfulness, answer_relevancy, context_precision, harmfulness]
Initialize Metrics with LLMs and Embeddings:
from ragas.run_config import RunConfig
from ragas.metrics.base import MetricWithLLM, MetricWithEmbeddings
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
def init_ragas_metrics(metrics, llm, embedding):
for metric in metrics:
if isinstance(metric, MetricWithLLM):
metric.llm = llm
if isinstance(metric, MetricWithEmbeddings):
metric.embeddings = embedding
run_config = RunConfig()
metric.init(run_config)
llm = ChatOpenAI()
emb = OpenAIEmbeddings()
init_ragas_metrics(
metrics,
llm=LangchainLLMWrapper(llm),
embedding=LangchainEmbeddingsWrapper(emb),
)
Score Your Traces:
async def score_with_ragas(query, chunks, answer):
scores = {}
for m in metrics:
print(f"calculating {m.name}")
scores[m.name] = await m.ascore(
row={"question": query, "contexts": chunks, "answer": answer}
)
return scores
row = amnesty_qa[0]
question, contexts, answer = row["question"], row["contexts"], row["answer"]
await score_with_ragas(question, contexts, answer)
By following these steps, you can evaluate your RAG pipeline without needing a ground truth dataset [1].
To continue talking to Dosu, mention @dosu.
what metrics need GT
how can I make sure that my embeddings are a subclass of BaseRagasEmbeddings? is it better to let the ragas compute the embeddings from beginning?
Metrics in Ragas that require ground truth (GT) include those that have the G - ground_truth: ground truth answer
in their schema [1].
To ensure that your embeddings are a subclass of BaseRagasEmbeddings
, you need to define your embeddings class to inherit from BaseRagasEmbeddings
. Here is an example:
from ragas.embeddings import BaseRagasEmbeddings
class MyEmbeddings(BaseRagasEmbeddings):
# Implement required methods and properties
pass
It is generally better to let Ragas compute the embeddings from the beginning to ensure compatibility and proper initialization within the framework. This is because the MetricWithEmbeddings
class checks if the embeddings are provided and initializes them accordingly [1].
To continue talking to Dosu, mention @dosu.
if we let ragas to compute the embeddings is it more costly than we define an class embedding that inherit from baseragasembeddings ? or there is no such difference
Letting Ragas compute the embeddings using its default settings can be more costly compared to defining a custom class that inherits from BaseRagasEmbeddings
, depending on the models and APIs used.
Ragas' default behavior uses the "BAAI/bge-small-en-v1.5" model from Huggingface, which can be cost-effective if you have the necessary infrastructure to run it locally. However, if you use external services like OpenAI for embedding computation, the costs can add up due to API usage fees.
By defining a custom class that inherits from BaseRagasEmbeddings
, you can control the embedding process and potentially use more cost-effective or specialized models that suit your needs. This approach allows you to optimize for both performance and cost.
Here is an example of how to define a custom embedding class:
from ragas.embeddings import BaseRagasEmbeddings
from typing import List
class MyCustomEmbeddings(BaseRagasEmbeddings):
def embed_query(self, text: str) -> List[float]:
# Implement your custom embedding logic here
pass
def embed_documents(self, texts: List[str]) -> List[List[float]]:
# Implement your custom embedding logic here
pass
async def aembed_query(self, text: str) -> List[float]:
# Implement your custom embedding logic here
pass
async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
# Implement your custom embedding logic here
pass
def set_run_config(self, run_config: RunConfig):
self.run_config = run_config
This way, you can use your own models and infrastructure, potentially reducing costs compared to using external APIs like OpenAI [1].
To continue talking to Dosu, mention @dosu.
give me the conclusion for this topic we are getting far from the real context. I know it is better to let the ragas do the embeddings but if i want to consider the cost or price for this is it better to use my own calculated embeddings and then in some ways or method make sure that my embeddings are suitable or compatible for using ragas,
@amin-kh96 where you able to solve this?
you're using faithfulness right? that metric requires an LLM and will be created for you automatically
I have a list like this: precomputed_embeddings = [ [-0.019296993, 0.034829292, 0.12389234], # Embedding 1 [-0.02389233, 0.104949302, 0.04929383], # Embedding 2 [-0.00283949, 0.00849322, 0.00938934] # Embedding 3 ]
and I want to use evaluate function from RAGAS. should I create a subclass of BaseRagasEmbeddings? If I want to do that I dont want to use the texts beacuse I dont have them, all I have now is these embeddings. I have an example of code which says use get_embedding but in the code it wants me to use the text, here is the code from chatGPT:
from ragas.embeddings import BaseRagasEmbeddings import numpy as np
class CustomListEmbeddings(BaseRagasEmbeddings): def init(self, precomputed_embeddings): """ :param precomputed_embeddings: A list of precomputed embeddings. """ self.precomputed_embeddings = precomputed_embeddings
def get_embedding(self, text: str) -> np.ndarray:
"""
Return the corresponding embedding for a given text.
For this example, we assume that the text serves as an index.
"""
# In a real scenario, you need a mapping (index) between text and embeddings
# Here we use a dummy approach: assume text is an index or id.
index = int(text) # Convert text to an integer index
return np.array(self.precomputed_embeddings[index])
def get_embeddings(self, texts: list) -> np.ndarray:
"""
Return embeddings for a list of texts.
"""
embeddings = [self.get_embedding(text) for text in texts]
return np.vstack(embeddings) # Stack them into a 2D array
precomputed_embeddings = [ [-0.019296993, 0.034829292, 0.12389234], # Embedding 1 [-0.02389233, 0.104949302, 0.04929383], # Embedding 2 [-0.00283949, 0.00849322, 0.00938934] # Embedding 3 ]
custom_embeddings = CustomListEmbeddings(precomputed_embeddings)
embedding = custom_embeddings.get_embedding("1") # Assuming "1" is the index for the second embedding print("Retrieved embedding:", embedding)
ok i did it but take a look at my code and my error: my code:
import json
import pandas as pd
import numpy as np
from datasets import Dataset
from langchain.embeddings import HuggingFaceEmbeddings
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.embeddings import BaseRagasEmbeddings
from ragas.metrics import faithfulness, context_utilization
from ragas import evaluate
# Load the ground truth data
file_path = 'assets\\GT.json'
with open(file_path) as f:
ground_truth_data = json.load(f)
# Load the question and the answer and the chunks
file_path = 'assets\\user_llm_interaction_embeddings_c1521dd5_b819_4241_b3a4_3e5c1388037c.json'
with open(file_path) as f:
llm = json.load(f)
# Initialize an empty list to hold the new dataset
data_set = []
# Iterate through the list and combine every two dictionaries
for i in range(0, len(llm), 2):
combined_dict = {
"text_vector_1": llm[i].get("text_vector", []),
"text_vector_2": llm[i + 1].get("text_vector", []),
'chunks': llm[i + 1].get('chunks', [])
}
data_set.append(combined_dict)
def map_chunks(data_set, ground_truth_data):
for item in data_set: # Iterate over each dictionary in data_set
c = [] # Reset c for each item
for chunk_id in item['chunks']: # Loop through 'chunks' in the current dictionary
for element in ground_truth_data: # Loop through ground_truth_data
if element['id'] == chunk_id: # Match chunk_id with element's id
c.append(element['text_vector']) # Append the matching text_vector to c
item['chunks'] = c # Replace the original 'chunks' (ids) with the mapped text_vector values
return data_set # Return the updated data_set
data_set = map_chunks(data_set, ground_truth_data)
# Assuming data_set is a list of dictionaries
ragas_data = [
{
"question": entry["text_vector_1"], # Assuming this is a list of strings
"answer": entry["text_vector_2"], # Assuming this is a list of strings
"contexts": entry["chunks"] # Assuming this is a list of lists of strings
}
for entry in data_set
]
# Create the required structure for Dataset
formatted_data = {
"question": [entry["question"] for entry in ragas_data],
"contexts": [entry["contexts"] for entry in ragas_data],
"answer": [entry["answer"] for entry in ragas_data]
}
# Convert to a Hugging Face Dataset
dataset = Dataset.from_dict(formatted_data)
# Verify the dataset structure
print(dataset)
# Creating a subclass which is compatible with BaseRagasEmbeddings
class CustomListEmbeddings(BaseRagasEmbeddings):
def __init__(self, dataset):
"""
:param dataset: A list of precomputed embeddings.
"""
self.embedding_list = dataset
def get_embedding(self, index: str) -> np.ndarray:
"""
Return the corresponding embedding for a given index.
Here we assume the input is just an index (or identifier) as a string.
"""
index = int(index) # Convert index to an integer
return np.array(self.dataset[index])
def get_embeddings(self, indices: list) -> np.ndarray:
"""
Return embeddings for a list of indices.
"""
embeddings = [self.get_embedding(index) for index in indices]
return np.vstack(embeddings) # Stack them into a 2D array
def embed_documents(self, docs: list) -> np.ndarray:
"""
Embeds a list of documents (represented as indices).
"""
return self.get_embeddings(docs)
def embed_query(self, query: str) -> np.ndarray:
"""
Embeds a single query (represented as an index).
"""
return self.get_embedding(query)
# Initialize your custom embeddings
custom_embeddings = CustomListEmbeddings(formatted_data["question"])
# Step 4: Define the list of evaluation metrics
metrics = [faithfulness, context_utilization]
# Step 5: Evaluate the data using the RAGAS `evaluate` function
evaluation_report = evaluate(dataset=dataset, metrics=metrics, embeddings=custom_embeddings)
# Step 6: Display the evaluation results
print("RAGAS Evaluation Report:")
print(evaluation_report)
# Optional: Convert the evaluation report to a DataFrame for better readability
evaluation_df = pd.DataFrame(evaluation_report.items(), columns=['Metric', 'Score'])
print("\nEvaluation Report as DataFrame:")
print(evaluation_df)
my error:
Exception has occurred: ValidationError
4608 validation errors for MultiTurnSample
user_input -> 0
value is not a valid dict (type=type_error.dict)
user_input -> 0
value is not a valid dict (type=type_error.dict)
user_input -> 0
value is not a valid dict (type=type_error.dict)
user_input -> 1
value is not a valid dict (type=type_error.dict)
user_input -> 1
value is not a valid dict (type=type_error.dict)
user_input -> 1
value is not a valid dict (type=type_error.dict)
user_input -> 2
value is not a valid dict (type=type_error.dict)
user_input -> 2
value is not a valid dict (type=type_error.dict)
I was using Azure Open AI following this example and so I did not expect to have to set OPENAI_API_KEY as I expected to connect with AZURE_OPENAI_API_KEY?
https://docs.ragas.io/en/latest/getstarted/rag_evaluation/#__tabbed_1_3
@amin-kh96 that is a validation error from pydantic, use https://docs.ragas.io/en/latest/references/metrics/?h=single#ragas.metrics.base.SingleTurnMetric.single_turn_ascore to figure it out
@aidanrussell it is using azure openai key right? could you maybe share a screen shot of the problem, not able to quite understand the issue sadly
@dosu can the version of pydantic cause the error : ''fail to parse the output?'' the output parser in ragas use pydantic parser as I show it to you here. class RagasoutputParser(PydanticOutputParser): async def aparse( # type: ignore self, result: str, prompt: PromptValue, llm: BaseRagasLLM, max_retries: int = 1 ): try: output = super().parse(result) except OutputParserException: if max_retries != 0: p_value = FIX_OUTPUT_FORMAT.format( prompt=prompt.to_string(), completion=result ) output = await llm.generate(p_value) result = output.generations[0][0].text return await self.aparse(result, prompt, llm, max_retries - 1) else: logger.warning("Failed to parse output. Returning None.") return None return output
Using your own precomputed embeddings can be more cost-effective than letting Ragas compute them, especially if you already have them available and they are compatible with Ragas. This avoids the need to compute new embeddings, which can be resource-intensive, particularly if using paid services like OpenAI [1].
To ensure your embeddings are compatible with Ragas, you should create a subclass of BaseRagasEmbeddings
. This involves implementing the required methods for embedding operations, such as embed_query
and embed_documents
. Your implementation should handle the embeddings you have, even if they are precomputed and you don't have the original text data [2].
Regarding the error you're encountering, it seems to be related to the structure of your dataset or the way the embeddings are being accessed. Ensure that your dataset is structured correctly and that the indices or identifiers used to access embeddings are valid. The error might also be related to the version of Pydantic, as it plays a role in data validation and parsing within Ragas [3].
To continue talking to Dosu, mention @dosu.
@dosu how can I solve this : Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None.
The issue "Failed to parse output. Returning None" is likely related to the RagasOutputParser
and its parse_output_string
method. This error occurs when the output string cannot be parsed into the expected format defined by the OutputModel
.
To resolve this issue, you can follow these steps:
Check the Output Format: Ensure that the output generated by the language model matches the expected JSON format defined by the OutputModel
. The RagasOutputParser
expects the output to comply with a specific schema.
Handle Parsing Exceptions: Add detailed logging to understand why the parsing is failing. Log the output_string
before parsing to see what the model is generating.
Adjust the Prompt: If the output format is incorrect, consider adjusting the prompt or the input data to guide the language model to produce the desired output format. The PydanticPrompt
class provides methods to generate instructions and examples that can help in guiding the model.
Review the RagasOutputParser
Implementation: Ensure that the RagasOutputParser
is correctly implemented to handle the expected output format. If there are specific parsing rules or conditions, verify that they align with the output being generated. The parser includes a mechanism to retry parsing with a fixed output format if the initial parsing fails, which can be useful in resolving format issues.
By following these steps, you should be able to identify and resolve the parsing issue. Additionally, consider using the FixOutputFormat
prompt to automatically correct output strings that do not meet the expected format [1][2].
To continue talking to Dosu, mention @dosu.
@dosu what happen here that i see this error? "Failed to parse output. Returning None."
the code: class RagasoutputParser(PydanticOutputParser): async def aparse( # type: ignore self, result: str, prompt: PromptValue, llm: BaseRagasLLM, max_retries: int = 1 ): try: output = super().parse(result) except OutputParserException: if max_retries != 0: p_value = FIX_OUTPUT_FORMAT.format( prompt=prompt.to_string(), completion=result ) output = await llm.generate(p_value) result = output.generations[0][0].text return await self.aparse(result, prompt, llm, max_retries - 1) else: logger.warning("Failed to parse output. Returning None.") return None return output
The error "Failed to parse output. Returning None" occurs when the RagasOutputParser
is unable to parse the output string into the expected format. This typically happens when the output does not match the schema defined by the OutputModel
.
Here are some steps to troubleshoot and resolve this issue:
Verify Output Format: Ensure that the output from the language model is in the correct format expected by the OutputModel
. The parser expects a specific structure, and any deviation can cause parsing to fail.
Logging: Add logging to capture the result
before parsing attempts. This will help you understand what the model is generating and why it might not be parsing correctly.
Prompt Adjustments: If the output format is incorrect, consider modifying the prompt or input data to better guide the language model to produce the desired output format.
Use FixOutputFormat
: The RagasOutputParser
includes a retry mechanism using the FixOutputFormat
class to attempt to correct the output format. Ensure that this mechanism is correctly implemented and that the FIX_OUTPUT_FORMAT
template is properly defined to guide the correction process [1].
Review Parser Implementation: Double-check the implementation of the RagasOutputParser
to ensure it aligns with the expected output format and that all necessary conditions for parsing are met.
By following these steps, you should be able to identify the root cause of the parsing failure and address it effectively.
To continue talking to Dosu, mention @dosu.
@dosu is it possible that the error 'fail to parse the output' might be related to the dataset which is in Italian?
The error "fail to parse the output" could potentially be related to the dataset being in Italian if the language model or the parsing logic is not properly handling multilingual data. Here are a few considerations:
Language Handling: Ensure that the language model and the RagasOutputParser
are configured to handle Italian text. If the model is not trained or configured for Italian, it might produce outputs that do not match the expected format.
Prompt Adaptation: Use Ragas' language adaptation feature to adapt prompts to Italian. This involves translating few-shot examples to the target language while keeping instructions in English. This can help guide the model to produce outputs in the correct format for Italian text.
Output Format: Verify that the output format expected by the OutputModel
is language-agnostic. If there are language-specific constraints, ensure they are correctly defined for Italian.
Character Encoding: Check for any character encoding issues that might arise from handling Italian text, especially if it includes special characters or accents.
If the issue persists, consider logging the outputs and any parsing errors to better understand how the Italian dataset might be affecting the parsing process.
To continue talking to Dosu, mention @dosu.
@dosu I want to create a subclass of baseragasllm that instead of generating anything uses my dataset. so the point is I already have the question and the context and also the answer.
To create a subclass of BaseRagasLLM
that uses your existing dataset instead of generating new content, you can implement the required methods to return the pre-existing answers from your dataset. Here's a basic outline of how you can achieve this:
Subclass BaseRagasLLM
: Implement the generate_text
and agenerate_text
methods to return answers from your dataset based on the provided questions.
Use a Mapping: Create a mapping from questions to answers in your dataset to facilitate quick retrieval.
Here's an example implementation:
from ragas.llms.base import BaseRagasLLM
from langchain_core.outputs import LLMResult
from langchain_core.prompt_values import PromptValue
from langchain_core.callbacks import Callbacks
import typing as t
class DatasetRagasLLM(BaseRagasLLM):
def __init__(self, dataset):
self.dataset = dataset # Assume dataset is a dict mapping questions to answers
def generate_text(
self,
prompt: PromptValue,
n: int = 1,
temperature: float = 1e-8,
stop: t.Optional[t.List[str]] = None,
callbacks: Callbacks = None,
) -> LLMResult:
question = prompt.to_string()
answer = self.dataset.get(question, "Answer not found")
return LLMResult(generations=[[answer]])
async def agenerate_text(
self,
prompt: PromptValue,
n: int = 1,
temperature: t.Optional[float] = None,
stop: t.Optional[t.List[str]] = None,
callbacks: Callbacks = None,
) -> LLMResult:
# Asynchronous version of generate_text
return self.generate_text(prompt, n, temperature, stop, callbacks)
# Example usage
dataset = {
"What is the capital of France?": "Paris",
"What is the largest planet?": "Jupiter",
# Add more question-answer pairs
}
llm = DatasetRagasLLM(dataset)
This setup allows you to use your dataset directly for generating responses, bypassing the need for actual text generation. Ensure that your dataset is structured to efficiently map questions to answers.
To continue talking to Dosu, mention @dosu.
@dosu does it support the context also? my dataset is in this format (list of dictionaries). new_ragas_dataset =[ { 'question' : , 'context' : , 'answer' : },{},...]
Yes, you can support context in your subclass of BaseRagasLLM
by including it in your dataset and using it when generating responses. Here's how you can modify the previous example to handle context:
Include Context in Your Dataset: Ensure your dataset includes context along with questions and answers.
Modify the generate_text
Method: Use both the question and context to retrieve the answer.
Here's an updated implementation:
from ragas.llms.base import BaseRagasLLM
from langchain_core.outputs import LLMResult
from langchain_core.prompt_values import PromptValue
from langchain_core.callbacks import Callbacks
import typing as t
class DatasetRagasLLM(BaseRagasLLM):
def __init__(self, dataset):
self.dataset = dataset # Assume dataset is a list of dicts with 'question', 'context', and 'answer'
def generate_text(
self,
prompt: PromptValue,
n: int = 1,
temperature: float = 1e-8,
stop: t.Optional[t.List[str]] = None,
callbacks: Callbacks = None,
) -> LLMResult:
question = prompt.to_string()
# Find the entry in the dataset that matches the question
entry = next((item for item in self.dataset if item['question'] == question), None)
if entry:
answer = entry['answer']
else:
answer = "Answer not found"
return LLMResult(generations=[[answer]])
async def agenerate_text(
self,
prompt: PromptValue,
n: int = 1,
temperature: t.Optional[float] = None,
stop: t.Optional[t.List[str]] = None,
callbacks: Callbacks = None,
) -> LLMResult:
# Asynchronous version of generate_text
return self.generate_text(prompt, n, temperature, stop, callbacks)
# Example usage
new_ragas_dataset = [
{'question': 'What is the capital of France?', 'context': 'European capitals', 'answer': 'Paris'},
{'question': 'What is the largest planet?', 'context': 'Solar system', 'answer': 'Jupiter'},
# Add more entries
]
llm = DatasetRagasLLM(new_ragas_dataset)
This setup allows you to use both the question and context from your dataset to generate responses. Adjust the logic as needed to incorporate context into your response generation process.
To continue talking to Dosu, mention @dosu.
@dosu I do not understand why I have everything in the right format and why I should retrieve anything again?
If you already have your dataset in the correct format and it includes all necessary information (questions, contexts, and answers), you don't need to retrieve or generate anything again. The purpose of creating a subclass of BaseRagasLLM
that uses your dataset is to allow the Ragas framework to interact with your pre-existing data as if it were generating responses. This is useful for evaluation purposes, where you want to simulate the behavior of a language model using your dataset.
The subclass simply provides a way to return the pre-existing answers from your dataset when the framework requests a response. This avoids unnecessary computation and leverages your existing data effectively. If your dataset is already structured correctly, you can directly use it with the subclass to provide responses without additional retrieval or generation steps.
To continue talking to Dosu, mention @dosu.
@dosu ok based on what you said just give me the code.
Here's the code to create a subclass of BaseRagasLLM
that uses your existing dataset to provide answers based on questions and contexts:
from ragas.llms.base import BaseRagasLLM
from langchain_core.outputs import LLMResult
from langchain_core.prompt_values import PromptValue
from langchain_core.callbacks import Callbacks
import typing as t
class DatasetRagasLLM(BaseRagasLLM):
def __init__(self, dataset):
self.dataset = dataset # Assume dataset is a list of dicts with 'question', 'context', and 'answer'
def generate_text(
self,
prompt: PromptValue,
n: int = 1,
temperature: float = 1e-8,
stop: t.Optional[t.List[str]] = None,
callbacks: Callbacks = None,
) -> LLMResult:
question = prompt.to_string()
# Find the entry in the dataset that matches the question
entry = next((item for item in self.dataset if item['question'] == question), None)
if entry:
answer = entry['answer']
else:
answer = "Answer not found"
return LLMResult(generations=[[answer]])
async def agenerate_text(
self,
prompt: PromptValue,
n: int = 1,
temperature: t.Optional[float] = None,
stop: t.Optional[t.List[str]] = None,
callbacks: Callbacks = None,
) -> LLMResult:
# Asynchronous version of generate_text
return self.generate_text(prompt, n, temperature, stop, callbacks)
# Example usage
new_ragas_dataset = [
{'question': 'What is the capital of France?', 'context': 'European capitals', 'answer': 'Paris'},
{'question': 'What is the largest planet?', 'context': 'Solar system', 'answer': 'Jupiter'},
# Add more entries
]
llm = DatasetRagasLLM(new_ragas_dataset)
This code defines a DatasetRagasLLM
class that uses your dataset to return answers based on the questions provided. The generate_text
method looks up the question in the dataset and returns the corresponding answer. This allows you to use your dataset directly for evaluations without generating new content.
To continue talking to Dosu, mention @dosu.
@dosu Exception has occurred: ValidationError 4 validation errors for LLMResult generations.0.0.Generation Input should be a valid dictionary or instance of Generation [type=model_type, input_value='Answer not found', input_type=str] For further information visit https://errors.pydantic.dev/2.9/v/model_type generations.0.0.function-after[set_text(), ChatGeneration] Input should be a valid dictionary or instance of ChatGeneration [type=model_type, input_value='Answer not found', input_type=str] For further information visit https://errors.pydantic.dev/2.9/v/model_type generations.0.0.GenerationChunk Input should be a valid dictionary or instance of GenerationChunk [type=model_type, input_value='Answer not found', input_type=str] For further information visit https://errors.pydantic.dev/2.9/v/model_type generations.0.0.function-after[set_text(), ChatGenerationChunk] Input should be a valid dictionary or instance of ChatGenerationChunk [type=model_type, input_value='Answer not found', input_type=str] For further information visit https://errors.pydantic.dev/2.9/v/model_type File "C:\Users\Amin\git\ragas-prototype\src\rag_eva.py", line 276, in generate_text return LLMResult(generations=[[answer]])
To resolve the ValidationError
related to LLMResult
, you need to ensure that the response is properly wrapped in an appropriate object, such as Generation
or ChatGeneration
, before being included in the LLMResult
. The error occurs because the response is currently a string ('Answer not found') and not an instance of the expected types.
Here's how you can modify your generate_text
method to fix this:
from langchain_core.outputs import Generation, LLMResult
class DatasetRagasLLM(BaseRagasLLM):
def __init__(self, dataset):
self.dataset = dataset # Assume dataset is a list of dicts with 'question', 'context', and 'answer'
def generate_text(
self,
prompt: PromptValue,
n: int = 1,
temperature: float = 1e-8,
stop: t.Optional[t.List[str]] = None,
callbacks: Callbacks = None,
) -> LLMResult:
question = prompt.to_string()
# Find the entry in the dataset that matches the question
entry = next((item for item in self.dataset if item['question'] == question), None)
if entry:
answer = entry['answer']
else:
answer = "Answer not found"
# Wrap the answer in a Generation object
generation = Generation(text=answer)
# Return an LLMResult with the wrapped generation
return LLMResult(generations=[[generation]])
async def agenerate_text(
self,
prompt: PromptValue,
n: int = 1,
temperature: t.Optional[float] = None,
stop: t.Optional[t.List[str]] = None,
callbacks: Callbacks = None,
) -> LLMResult:
# Asynchronous version of generate_text
return self.generate_text(prompt, n, temperature, stop, callbacks)
By wrapping the answer
in a Generation
object, you ensure that the LLMResult
is constructed with the expected types, thus avoiding the ValidationError
[1].
To continue talking to Dosu, mention @dosu.
[ ] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question I wrote this code and I get the error:
The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable is it mandatory to set an API key to evaluate? if i have to use Api-key is there any free way to use it?
Code Examples
Additional context I have the embeddings my dataset is not str