Open botchagalupeai opened 2 weeks ago
Hey @botchagalupeai! I'm here to assist you with any questions, bugs, or contributions you have. Let's work together to resolve your issue.
To convert the synthetic test set generated using Ragas into a format suitable for evaluation, follow these steps:
Generate the Synthetic Test Set: Ensure you have generated the synthetic test set as described in the "Generate a Synthetic Test Set" section of the documentation. Here is an example:
from langchain_community.document_loaders import DirectoryLoader
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
# Load documents
loader = DirectoryLoader("your-directory")
documents = loader.load()
# Ensure metadata includes 'filename'
for document in documents:
document.metadata['filename'] = document.metadata['source']
# Initialize generator with OpenAI models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
# Generate test set
testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
# Export to Pandas DataFrame
test_df = testset.to_pandas()
Convert the Test Set for Evaluation: Use the following code snippet to convert the test set into the required format for evaluation:
from datasets import Dataset
from tqdm.auto import tqdm
import pandas as pd
def generate_response(query_engine, question):
response = query_engine.query(question)
return {
"answer": response.response,
"contexts": [c.node.get_content() for c in response.source_nodes],
}
def generate_ragas_dataset(query_engine, test_df):
test_questions = test_df["question"].values
responses = [generate_response(query_engine, q) for q in tqdm(test_questions)]
dataset_dict = {
"question": test_questions,
"answer": [response["answer"] for response in responses],
"contexts": [response["contexts"] for response in responses],
"ground_truth": test_df["ground_truth"].values.tolist(),
}
ds = Dataset.from_dict(dataset_dict)
return ds
ragas_eval_dataset = generate_ragas_dataset(query_engine, test_df)
ragas_evals_df = pd.DataFrame(ragas_eval_dataset)
ragas_evals_df.head()
Evaluate the RAG Pipeline: Follow these steps to evaluate your RAG pipeline:
import os
os.environ["OPENAI_API_KEY"] = "your-openai-key"
from ragas.metrics import (
answer_relevancy,
faithfulness,
context_recall,
context_precision,
)
from ragas import evaluate
result = evaluate(
ragas_eval_dataset,
metrics=[
context_precision,
faithfulness,
answer_relevancy,
context_recall,
],
)
df = result.to_pandas()
df.head()
By following these steps, you will have a synthetic test set ready for evaluation and can analyze the performance of your RAG pipeline using the provided metrics [1][2][3].
To continue talking to Dosu, mention @dosu.
where was query_engine supposed to be defined
The query_engine
is defined in the build_query_engine
function. Here is the relevant code snippet to define and initialize the query_engine
:
import nest_asyncio
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.embeddings import OpenAIEmbedding
nest_asyncio.apply()
def build_query_engine(documents):
vector_index = VectorStoreIndex.from_documents(
documents,
service_context=ServiceContext.from_defaults(chunk_size=512),
embed_model=OpenAIEmbedding(),
)
query_engine = vector_index.as_query_engine(similarity_top_k=2)
return query_engine
# Example usage
documents = [...] # Load your documents here
query_engine = build_query_engine(documents)
Make sure you have your documents loaded and passed to the build_query_engine
function to initialize the query_engine
[1].
To continue talking to Dosu, mention @dosu.
@dosu where was query_engine supposed to be defined in these code examples
The query_engine
is defined in the build_query_engine
function. Here is the relevant code snippet to define and initialize the query_engine
:
import nest_asyncio
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.embeddings import OpenAIEmbedding
nest_asyncio.apply()
def build_query_engine(documents):
vector_index = VectorStoreIndex.from_documents(
documents,
service_context=ServiceContext.from_defaults(chunk_size=512),
embed_model=OpenAIEmbedding(),
)
query_engine = vector_index.as_query_engine(similarity_top_k=2)
return query_engine
# Example usage
documents = [...] # Load your documents here
query_engine = build_query_engine(documents)
Make sure you have your documents loaded and passed to the build_query_engine
function to initialize the query_engine
[1][2].
To continue talking to Dosu, mention @dosu.
@dosu
This is confusing. In the examples on the web page there is no vector database. It's just the synthetic dataset passed to the evaluate
The webpage examples do not include a vector database because they focus on loading documents using langchain
and llama-index
document loaders. The synthetic dataset is passed to the evaluate
function by first generating the test set using the TestsetGenerator
and then converting it to a pandas DataFrame.
Here is the relevant code snippet:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
TEST_SIZE = 25
# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()
generator = TestsetGenerator.from_langchain(generator_llm, critic_llm, embeddings)
# set question type distribution
distribution = {simple: 0.5, reasoning: 0.25, multi_context: 0.25}
# generate testset
testset = generator.generate_with_llamaindex_docs(
documents, test_size=TEST_SIZE, distributions=distribution
)
test_df = testset.to_pandas()
test_df.head()
This code demonstrates how to generate a synthetic test dataset using TestsetGenerator
and convert it to a pandas DataFrame for evaluation [1].
To continue talking to Dosu, mention @dosu.
@dosu
I created my synthetic dataset and I want to pass it to the evaluate instead of the amnesty_qa dataset
To pass your synthetic dataset to the evaluate
function instead of using the amnesty_qa
dataset, you need to replace the amnesty_qa["eval"]
with your synthetic dataset. Here is an example of how you can do it:
from ragas import evaluate
from ragas.metrics import (
answer_relevancy,
faithfulness,
context_recall,
context_precision,
)
# Assuming your synthetic dataset is loaded into a variable named `synthetic_dataset`
result = evaluate(
synthetic_dataset,
metrics=[
context_precision,
faithfulness,
answer_relevancy,
context_recall,
],
)
result
Make sure that your synthetic_dataset
is in the same format as expected by the evaluate
function. The dataset should contain the following columns:
question
: list[str]
- These are the questions your RAG pipeline will be evaluated on.contexts
: list[list[str]]
- The contexts which were passed into the LLM to answer the question.ground_truth
: list[list[str]]
- The ground truth answer to the questions. However, this can be ignored for online evaluations since we will not have access to ground-truth data in our case.Additionally, ensure you have your OpenAI key set in your environment if you are using metrics that require it:
import os
os.environ["OPENAI_API_KEY"] = "your-openai-key"
If you are using Amazon Bedrock instead of OpenAI, you will need to configure the BedrockChat
and BedrockEmbeddings
classes from Langchain and pass them to the evaluate
function. Here is an example setup:
from langchain_community.chat_models import BedrockChat
from langchain_community.embeddings import BedrockEmbeddings
config = {
"credentials_profile_name": "your-profile-name", # E.g "default"
"region_name": "your-region-name", # E.g. "us-east-1"
"model_id": "your-model-id", # E.g "anthropic.claude-v2"
"model_kwargs": {"temperature": 0.4},
}
bedrock_model = BedrockChat(
credentials_profile_name=config["credentials_profile_name"],
region_name=config["region_name"],
endpoint_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
model_id=config["model_id"],
model_kwargs=config["model_kwargs"],
)
bedrock_embeddings = BedrockEmbeddings(
credentials_profile_name=config["credentials_profile_name"],
region_name=config["region_name"],
)
result = evaluate(
synthetic_dataset,
metrics=[
context_precision,
faithfulness,
answer_relevancy,
context_recall,
],
llm=bedrock_model,
embeddings=bedrock_embeddings,
)
result
This setup ensures that you can use Amazon Bedrock models and embeddings for evaluation [1][2][3].
To continue talking to Dosu, mention @dosu.
@dosu
My question is how can I convert my synthetic dataset so that it's compatible with the evaluate. I can't just pass the synthetic dataset from the synthetic data page to the evaluate function
To convert your synthetic dataset so that it's compatible with the evaluate
function, you need to ensure that your dataset is in the format expected by the evaluate
function. Here is an example of how to load a sample dataset and use it with the evaluate
function:
from datasets import load_dataset
from ragas import evaluate
from ragas.metrics import (
context_precision,
faithfulness,
context_recall,
answer_relevancy,
)
import os
import nest_asyncio
# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "your-openai-key"
# Load your synthetic dataset
synthetic_dataset = load_dataset("path_to_your_synthetic_dataset")
# Define the metrics
metrics = [
faithfulness,
context_recall,
context_precision,
answer_relevancy,
]
# Apply nest_asyncio if running in a Jupyter notebook
nest_asyncio.apply()
# Evaluate the dataset
result = evaluate(
synthetic_dataset["eval"],
metrics=metrics,
)
# Convert the result to a pandas DataFrame
df = result.to_pandas()
df.head()
Ensure that your synthetic dataset is structured similarly to the amnesty_qa
dataset used in the example. The dataset should contain the following columns:
question
: list[str]
- These are the questions your RAG pipeline will be evaluated on.contexts
: list[list[str]]
- The contexts which were passed into the LLM to answer the question.ground_truth
: list[str]
- The ground truth answer to the questions.Additionally, you can refer to the following example from the ragas
framework to understand how to structure your dataset and upload it for evaluation:
import pandas as pd
from uuid import UUID
from langsmith import Client
client = Client()
# Sample questions and answers
questions = [
"What are the advantages of remote work? Why does gitlab do it?",
"what are the dis-advantages of remote work in gitlab? How do you mitigate that?",
"What does it mean to be 'all-remote'? Why is it important?",
"How does compensation work in an 'all-remote' setting?",
"How to run effective meetings in 'all-remote' setting",
]
answers = [
"Remote work offers numerous advantages including increased flexibility, the ability to hire top talent globally without geographical constraints, enhanced productivity due to fewer distractions, and significant cost savings on office space and related expenses. GitLab adopts an all-remote model to leverage these benefits, ensuring a more inclusive and diverse workforce, fostering a culture that values output over presence, and providing employees the freedom to work in environments that best suit their personal and professional needs. This approach not only supports individual employee well-being and work-life balance but also positions GitLab as a resilient and adaptive organization in a rapidly changing global work landscape.",
"GitLab's remote work challenges include onboarding difficulties, loneliness, communication breakdowns, work/life balance issues, time zone conflicts, and the need for strong time management skills. To mitigate these, GitLab employs strategies such as providing comprehensive onboarding resources, fostering community through virtual coffee chats and social calls, prioritizing asynchronous communication, reimbursing coworking spaces, empowering employees to manage their own schedules, focusing on results rather than hours, and screening for self-discipline during the hiring process. These measures aim to enhance productivity and employee satisfaction in a remote work setting.",
"Being 'all-remote' means that an organization empowers every individual to work from any location where they feel most fulfilled, without the need to report to a company-owned office, thereby treating all employees equally regardless of their physical location. This approach is important because it eliminates location hierarchy, allowing for a more inclusive work environment where team members have the autonomy to create their ideal workspace and can maintain their job regardless of life changes, such as relocations due to family commitments. It supports a diverse workforce, including caregivers, working parents, and military spouses, by providing them with the flexibility to work from anywhere, fostering equality among all employees and enabling a global talent pool without the constraints of geographical boundaries.",
"In an 'all-remote' setting, such as at GitLab, compensation is structured around local rates rather than a single global standard, which means employees are paid based on the cost of living and market rates in their respective locations. This approach allows the company to hire globally without being bound by the high salary standards of any particular region, like San Francisco. GitLab uses a compensation calculator to ensure transparency and fairness in pay, adjusting salaries based on a combination of factors including location, experience, and market data. Payments are typically made in the local currency of the employee, and for countries where direct employment isn't feasible, GitLab utilizes professional employment organizations or hires contractors. This model supports GitLab's global talent acquisition strategy while managing compensation costs effectively.",
"To run effective meetings in an 'all-remote' setting, it's crucial to be intentional about meeting necessity, provide clear agendas and supporting materials in advance, start and end on time, document discussions in real time, and make attendance optional to respect time zones and individual schedules. Recording meetings for asynchronous viewing, using reliable communication tools like Zoom, and ensuring active participation through video feedback are also key practices. This approach aligns with GitLab's guidelines for maximizing efficiency, inclusivity, and collaboration in a remote work environment.",
]
dataset = {"question": questions, "ground_truth": answers}
# Creating a pandas DataFrame from the dataset dictionary
df = pd.DataFrame(dataset)
# Upload to langsmith
langsmith_dataset = client.upload_dataframe(
name="temporal_dataset",
description="temporal dataset for testing langsmith",
df=df,
input_keys=["question"],
output_keys=["ground_truth"],
)
# Clean up the dataset after use
client.delete_dataset(dataset_id=langsmith_dataset.id)
This example demonstrates how to create a dataset with questions and ground truth answers, upload it to a service, and clean up afterward. Ensure your dataset follows a similar structure to be compatible with the evaluate
function [1][2][3][4][5].
To continue talking to Dosu, mention @dosu.
@dosu
Let me try this again.
How do I convert the dataset created by the following command testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
to be used by this code
from ragas import evaluate
result = evaluate( amnesty_qa["eval"], metrics=[ context_precision, faithfulness, answer_relevancy, context_recall, ], )
result
To convert the dataset created by the command testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
to be used by the evaluate
function from the ragas
library, you need to convert the testset
to a format that the evaluate
function can accept. The TestDataset
class provides a method to_dataset()
that converts the test data to a Dataset
object.
Here is how you can do it:
# Assuming `testset` is the result of the generate_with_langchain_docs method
dataset = testset.to_dataset()
# Now `dataset` can be used with the `evaluate` function from the `ragas` library
from ragas import evaluate
from ragas.metrics import (
context_precision,
faithfulness,
answer_relevancy,
context_recall,
)
result = evaluate(
dataset,
metrics=[
context_precision,
faithfulness,
answer_relevancy,
context_recall,
],
)
print(result)
The to_dataset()
method converts the test data into a Dataset
object, which is compatible with the evaluate
function [1][2].
To continue talking to Dosu, mention @dosu.
@dosu
Getting closer. Now I get this error
ValueError Traceback (most recent call last) Cell In[16], line 10 2 from ragas import evaluate 3 from ragas.metrics import ( 4 context_precision, 5 faithfulness, 6 answer_relevancy, 7 context_recall, 8 ) ---> 10 result = evaluate( 11 dataset, 12 metrics=[ 13 context_precision, 14 faithfulness, 15 answer_relevancy, 16 context_recall, 17 ], 18 ) 20 print(result)
File ~/.local/lib/python3.12/site-packages/ragas/_analytics.py:129, in track_was_completed.
ValueError: The metric [faithfulness] that that is used requires the following additional columns ['answer'] to be present in the dataset.
To convert the dataset created by testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
to be compatible with the evaluate
function from the ragas
library, ensuring it includes the required columns such as 'answer', you can follow these steps:
Here is the code to achieve this:
# Convert the testset to a pandas DataFrame
test_df = testset.to_pandas()
# Ensure the DataFrame includes the 'answer' column
# Assuming 'answer' is one of the fields in the generated testset
if 'answer' not in test_df.columns:
test_df['answer'] = test_df.apply(lambda row: row['expected_answer'], axis=1) # Adjust based on actual field names
# Collect questions and answers
test_questions = test_df['question'].values.tolist()
test_answers = [[item] for item in test_df['answer'].values.tolist()]
# Now you can use the DataFrame with the evaluate function
from ragas.llama_index import evaluate
import nest_asyncio
# Only used when running on a Jupyter notebook
nest_asyncio.apply()
# Define your metrics
from ragas.metrics import context_precision, context_recall
metrics = [context_precision, context_recall]
# Build your query engine (example with OpenAI embeddings)
from langchain.embeddings import OpenAIEmbeddings
from llama_index import VectorStoreIndex, ServiceContext
def build_query_engine(embed_model):
vector_index = VectorStoreIndex.from_documents(
documents, service_context=ServiceContext.from_defaults(chunk_size=512),
embed_model=embed_model,
)
query_engine = vector_index.as_query_engine(similarity_top_k=2)
return query_engine
openai_model = OpenAIEmbeddings()
query_engine = build_query_engine(openai_model)
# Evaluate
result = evaluate(query_engine, metrics, test_questions, test_answers)
result
This code ensures that the 'answer' column is included in the DataFrame, making it compatible with the evaluate
function from the ragas
library. Adjust the lambda function in the apply
method based on the actual field names in your testset. Additionally, it demonstrates how to build a query engine and evaluate the test questions and answers using the ragas
library [1][2][3][4][5].
To continue talking to Dosu, mention @dosu.
@dosu
can you give me a working example of how to generate synthetic data and run the evaluate
Here is a working example of how to generate synthetic data and run the evaluate
function using the ragas
library:
Generate Synthetic Data:
Use the TestsetGenerator
to create a synthetic dataset.
Convert the Dataset: Ensure the dataset is in the correct format for evaluation.
Run the Evaluate Function:
Use the evaluate
function to evaluate the synthetic dataset.
Here is the complete code:
import os
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import answer_relevancy, context_precision, faithfulness, context_recall
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "your-openai-key"
# Initialize OpenAI models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()
# Create a generator instance
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
# Load documents
from langchain_community.document_loaders import DirectoryLoader
loader = DirectoryLoader("your-directory")
documents = loader.load()
# Ensure metadata includes 'filename'
for document in documents:
document.metadata['filename'] = document.metadata['source']
# Generate test set
testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
# Convert to Pandas DataFrame
test_df = testset.to_pandas()
# Ensure the DataFrame includes the 'answer' column
if 'answer' not in test_df.columns:
test_df['answer'] = test_df.apply(lambda row: row['expected_answer'], axis=1) # Adjust based on actual field names
# Convert the DataFrame to a Dataset
dataset = Dataset.from_pandas(test_df)
# Run the evaluate function
result = evaluate(
dataset,
metrics=[answer_relevancy, context_precision, faithfulness, context_recall],
)
print(result)
This example demonstrates how to generate synthetic data, convert it to the required format, and run the evaluate
function using the ragas
library [1][2][3][4][5].
To continue talking to Dosu, mention @dosu.
@dosu
this code gets an error
if 'answer' not in test_df.columns: test_df['answer'] = test_df.apply(lambda row: row['expected_answer'], axis=1) # Adjust based on actual field names
KeyError Traceback (most recent call last)
File ~/.local/lib/python3.12/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key) 3804 try: -> 3805 return self._engine.get_loc(casted_key) 3806 except KeyError as err:
File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()
File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'expected_answer'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last) Cell In[8], line 3 1 # Ensure the DataFrame includes the 'answer' column 2 if 'answer' not in test_df.columns: ----> 3 test_df['answer'] = test_df.apply(lambda row: row['expected_answer'], axis=1) # Adjust based on actual field names ... 3815 # InvalidIndexError. Otherwise we fall through and re-raise 3816 # the TypeError. 3817 self._check_indexing_error(key)
KeyError: 'expected_answer'
To convert the dataset created by testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
to be used by the evaluate
function without encountering a KeyError for the 'expected_answer' column, you need to ensure that the 'expected_answer' column is present in the DataFrame. You can achieve this by adding a default value for the 'expected_answer' column if it is missing.
Here is how you can do it:
# Generate the test dataset
testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
# Convert the test dataset to a pandas DataFrame
df = testset.to_pandas()
# Ensure the 'expected_answer' column is present
if 'expected_answer' not in df.columns:
df['expected_answer'] = None # or any default value you prefer
# Ensure the 'answer' column is present
if 'answer' not in df.columns:
df['answer'] = df['expected_answer']
# Convert the DataFrame to a Dataset
from datasets import Dataset
dataset = Dataset.from_pandas(df)
# Now you can use the DataFrame with the evaluate function
from ragas import evaluate
from ragas.metrics import answer_relevancy, context_precision, faithfulness, context_recall
result = evaluate(
dataset,
metrics=[answer_relevancy, context_precision, faithfulness, context_recall],
)
print(result)
This code ensures that the 'expected_answer' column is added to the DataFrame if it is missing, and then it assigns the 'expected_answer' values to the 'answer' column if 'answer' is not already present. This will prevent a KeyError when using the evaluate
function [1].
To continue talking to Dosu, mention @dosu.
@dosu
ValueError Traceback (most recent call last) Cell In[9], line 20 17 from ragas import evaluate 18 from ragas.metrics import answer_relevancy, context_precision, faithfulness, context_recall ---> 20 result = evaluate( 21 dataset, 22 metrics=[answer_relevancy, context_precision, faithfulness, context_recall], 23 ) 25 print(result)
File ~/.local/lib/python3.12/site-packages/ragas/_analytics.py:129, in track_was_completed.
File ~/.local/lib/python3.12/site-packages/ragas/evaluation.py:165, in evaluate(dataset, metrics, llm, embeddings, callbacks, in_ci, run_config, token_usage_parser, raise_exceptions, column_map) 163 dataset = handle_deprecated_ground_truths(dataset) 164 validate_evaluation_modes(dataset, metrics) --> 165 validate_column_dtypes(dataset) 167 # set the llm and embeddings 168 if isinstance(llm, LangchainLLM): ... 48 ) 50 for column_names in ["contexts"]: 51 if column_names in ds.features:
ValueError: Dataset feature "answer" should be of type string
To convert the dataset created by testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
to be compatible with the evaluate
function from the ragas
library, ensuring that the 'answer' column is of type string, you can follow these steps:
datasets.Dataset
object.ragas
library.Here is the code to achieve this:
import pandas as pd
from datasets import Dataset
# Convert the testset to a pandas DataFrame
test_df = testset.to_pandas()
# Ensure the 'answer' column is of type string
if 'answer' not in test_df.columns:
test_df['answer'] = test_df['expected_answer'].astype(str) # Adjust based on actual field names
else:
test_df['answer'] = test_df['answer'].astype(str)
# Convert the DataFrame back to a datasets.Dataset object
test_dataset = Dataset.from_pandas(test_df)
# Validate the column data types
from ragas.validation import validate_column_dtypes
validate_column_dtypes(test_dataset)
# Now test_dataset is ready to be used with the evaluate function from the ragas library
from ragas import evaluate
from ragas.metrics import answer_relevancy, context_precision, faithfulness, context_recall
result = evaluate(
test_dataset,
metrics=[answer_relevancy, context_precision, faithfulness, context_recall],
)
print(result)
This will ensure that the 'answer' column in your dataset is of type string and that the dataset is validated for compatibility with the evaluate
function from the ragas
library [1][2].
To continue talking to Dosu, mention @dosu.
I am trying to follow the Get Started documentation on
https://docs.ragas.io/en/latest/getstarted/testset_generation.html
I apologize for being a newbie upfront.
I was able to create the synthetic test set from my data. However, I can't figure out how to convert the dataset (testset) into a format that the next step in the tutorial "Evaluating Using Your Test Set" (i.e., documentation example expects.)
https://docs.ragas.io/en/latest/getstarted/evaluation.html
It would enhance the tutorial if there were an intermediate step (example) to make the process more seamless. Any assistance in this matter would be greatly appreciated.