Closed manojpraba closed 2 months ago
Hey @manojpraba! I'm here to help you with any issues you're experiencing. I can assist with bug fixes, answer questions, and guide you to become a contributor. Let me know how I can assist you!
To generate the "answer" column in your dataset and avoid the ValueError
, you need to ensure that the TestsetGenerator
generates responses for the questions in your test dataset. Here is a step-by-step guide to achieve this:
Generate Responses for Questions: Use a function to query your query_engine
with each question and compile the results into a DataFrame that includes the "answer" column.
Create the Dataset: Use the generated responses to create a dataset that includes the "answer" column.
Here is an example code snippet to help you generate the "answer" column:
from datasets import Dataset
from tqdm.auto import tqdm
import pandas as pd
def generate_response(query_engine, question):
response = query_engine.query(question)
return {
"answer": response.response,
"contexts": [c.node.get_content() for c in response.source_nodes],
}
def generate_ragas_dataset(query_engine, test_df):
test_questions = test_df["question"].values
responses = [generate_response(query_engine, q) for q in tqdm(test_questions)]
dataset_dict = {
"question": test_questions,
"answer": [response["answer"] for response in responses],
"contexts": [response["contexts"] for response in responses],
"ground_truth": test_df["ground_truth"].values.tolist(),
}
ds = Dataset.from_dict(dataset_dict)
return ds
# Assuming `query_engine` and `test_df` are already defined
ragas_eval_dataset = generate_ragas_dataset(query_engine, test_df)
ragas_evals_df = pd.DataFrame(ragas_eval_dataset)
print(ragas_evals_df.head())
Additionally, if you are using custom LLMs or embeddings, you can replace the default models with your own when instantiating the TestsetGenerator
. Here is an example:
from ragas.testset.generator import TestsetGenerator
from langchain_together import Together
from langchain_together.embeddings import TogetherEmbeddings
together_key = "<your-key-here>"
embeddings = TogetherEmbeddings(model="togethercomputer/m2-bert-80M-8k-retrieval")
together_completion = Together(
model="NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT",
temperature=0.7,
max_tokens=4000,
top_k=1,
together_api_key=together_key
)
# generator with custom llm and embeddings
generator = TestsetGenerator.from_langchain(
generator_llm=together_completion,
critic_llm=together_completion,
embeddings=embeddings,
)
This ensures that the TestsetGenerator
uses your specified LLM and embeddings for generating the test set, including the "answer" column [1][2][3].
To continue talking to Dosu, mention @dosu.
@dosu what should be the value for query_engine if using azure open ai api
To use the Azure OpenAI API for generating the "answer" column with the TestsetGenerator
, you need to set up the query_engine
accordingly. Here is how you can do it:
import nest_asyncio
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.service_context import ServiceContext
from llama_index.embeddings.openai import OpenAIEmbedding
from datasets import Dataset
nest_asyncio.apply()
def build_query_engine(documents):
vector_index = VectorStoreIndex.from_documents(
documents,
service_context=ServiceContext.from_defaults(chunk_size=512),
embed_model=OpenAIEmbedding(api_type="azure"),
)
query_engine = vector_index.as_query_engine(similarity_top_k=3)
return query_engine
def generate_responses(query_engine, test_questions, test_answers):
responses = [query_engine.query(q) for q in test_questions]
answers = []
contexts = []
for r in responses:
answers.append(r.response)
contexts.append([c.node.get_content() for c in r.source_nodes])
dataset_dict = {
"question": test_questions,
"answer": answers,
"contexts": contexts,
}
if test_answers is not None:
dataset_dict["ground_truth"] = test_answers
ds = Dataset.from_dict(dataset_dict)
return ds
reader = SimpleDirectoryReader(PATH, num_files_limit=30, required_exts=[".pdf"])
documents = reader.load_data()
test_questions = eval_dataset["question"].values.tolist()
test_answers = eval_dataset["ground_truth"].values.tolist()
query_engine = build_query_engine(documents)
result_ds = generate_responses(query_engine, test_questions, test_answers)
In this setup, the api_type
parameter is set to "azure"
to indicate that the Azure OpenAI API is being used. This aligns with the approach described in the ragas
library documentation for integrating with LlamaIndex and generating test datasets [1][2][3][4][5].
To continue talking to Dosu, mention @dosu.
same issue, do you solve this problem?
As mentionned in the documentation, "Faithfulness - Measures the factual consistency of the answer to the context based on the question.". So you need to provide an answer for earch query in order to compute this metric.
To do so you need to generate responses with your RAG/LLM and append them to your dataset:
[!NOTE]
I'm using LangChain terminology, feel free to adapt myllm.invoke(question).content
to your configuration - I've also taken your variablegenerator
instanciated fromTestsetGenerator.from_langchain
class method:
answers = [llm.invoke(question).content for question in generator["question"]]
[!NOTE]
SinceTestsetGenerator
doesn't seem to have a method to add a column, I've exported theTestsetGenerator
object to apandas.Dataframe
object and I've added a "answer" column:generator_with_generated_answers = generator.to_pandas() generator_with_generated_answers["answer"] = answers
[!NOTE] According to the
datasets
API,evaluate
method expects aDataset
object, so I've exported back to this type:from datasets import Dataset
generated_dataset = Dataset.from_pandas(generator_with_generated_answers)
Finally, you can use your new dataset to evaluate your RAG/LLM:
```python
results = evaluate(generated_dataset, metrics=metrics, llm=azure_model, embeddings=azure_embeddings, raise_exceptions=False)
[!WARNING]
Since you use the same LLM to generate your synthetic datasetground_truth
and youranswer
, I think the results of this evaluation might be biased. I haven't realized a comparative study but it might be an issue which could have an impact on your interpretation.
It seems the issue was answered, closing this now.
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug when I creating test data using below code, answer column is missing not generated
import nest_asyncio nest_asyncio.apply() generator = TestsetGenerator.from_langchain( generator_llm=azure_model, critic_llm=azure_model, embeddings=azure_embeddings )
testset = generator.generate_with_langchain_docs(
Ragas version: Python version: 3.10
Code to Reproduce
Error trace { "name": "ValueError", "message": "The metric [faithfulness] that that is used requires the following additional columns ['answer'] to be present in the dataset. ", "stack": "--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[178], line 3 1 import nest_asyncio 2 nest_asyncio.apply() ----> 3 result = evaluate( 4 dataset1,metrics=metrics1, llm=azure_model, embeddings=azure_embeddings, raise_exceptions=False)
File c:\DL projects\ai-kb-bot\.venv\lib\site-packages\ragas\evaluation.py:156, in evaluate(dataset, metrics, llm, embeddings, callbacks, in_ci, is_async, run_config, raise_exceptions, column_map) 154 # validation 155 dataset = handle_deprecated_ground_truths(dataset) --> 156 validate_evaluation_modes(dataset, metrics) 157 validate_column_dtypes(dataset) 159 # set the llm and embeddings
File c:\DL projects\ai-kb-bot\.venv\lib\site-packages\ragas\validation.py:97, in validate_evaluation_modes(ds, metrics, evalmode_to_columns) 91 if ( 92 isinstance(m, ContextPrecision) 93 and \"ground_truth\" not in available_columns 94 ): 95 extra_msg = \"Looks like you're trying to use 'context_precision' without ground_truth. Please use consider using `context_utilization' instead.\" ---> 97 raise ValueError( 98 f\"The metric [{m.name}] that that is used requires the following \" 99 f\"additional columns {list(required_columns - available_columns)} \" 100 f\"to be present in the dataset. {extra_msg}\" 101 )
ValueError: The metric [faithfulness] that that is used requires the following additional columns ['answer'] to be present in the dataset. " Expected behavior Expecting workaround for generating answer column
Additional context