Open atr-ip opened 3 months ago
Hey @atr-ip! I'm here to help you with any issues or questions you have. Let's get to the bottom of this together.
To ensure the generated data changes sentence by sentence using the ragas
library, you need to create a custom Prompt
object and use it in your evaluation metric. Here's how you can do it:
Create a Custom Prompt Object:
Define a new Prompt
object with the desired instructions and examples.
from ragas.llms.prompt import Prompt
long_form_answer_prompt_new = Prompt(
name="long_form_answer_new_v1",
instruction="Create one or more statements from each sentence in the given answer.",
examples=[
{
"question": "Which is the only planet in the solar system that has life on it?",
"answer": "earth",
"statements": {
"statements": [
"Earth is the only planet in the solar system that has life on it."
]
},
},
{
"question": "Were Hitler and Benito Mussolini of the same nationality?",
"answer": "Sorry, I can't provide an answer to that question.",
"statements": {
"statements": []
},
},
],
input_keys=["question", "answer"],
output_key="statements",
output_type="json",
)
Use the Custom Prompt in Evaluations:
Replace the default prompt in the faithfulness
metric with your custom prompt.
from ragas.metrics import faithfulness
faithfulness.long_form_answer_prompt = long_form_answer_prompt_new
print(faithfulness.long_form_answer_prompt.to_string())
Evaluate the Dataset:
Use the evaluate
function to evaluate your dataset with the new prompt configuration.
from ragas import evaluate
result = evaluate(
dataset["train"].select(range(3)), # selecting only 3
metrics=[
faithfulness
],
)
result
By following these steps, you ensure that the generated data changes sentence by sentence as specified in the custom prompt's instructions [1].
To continue talking to Dosu, mention @dosu.
this did not help, because I do not want to prompt the metrics, but the generated data itself
To ensure that the generated data changes sentence by sentence according to specific sentence structures using the ragas
library, you can use the TEXT_EXTRACT_KEYPHRASES
, TEXT_GENERATE_QUESTIONS
, and TEXT_GENERATE_ANSWERS
prompts. These prompts are designed to extract key phrases, generate questions, and generate answers based on the given text, ensuring that the generated data adheres to specific sentence structures.
Here's an example of how you can implement this:
from ragas.llms.prompt import Prompt
from ragas.llms.output_parser import RagasoutputParser, get_json_format_instructions
from ragas.metrics._summarization import (
TEXT_EXTRACT_KEYPHRASES,
TEXT_GENERATE_QUESTIONS,
TEXT_GENERATE_ANSWERS,
SummarizationScore,
)
# Example text
text = """JPMorgan Chase & Co. is an American multinational finance company headquartered in New York City. It is the largest bank in the United States and the world's largest by market capitalization as of 2023. Founded in 1799, it is a major provider of investment banking services, with US$3.9 trillion in total assets, and ranked #1 in the Forbes Global 2000 ranking in 2023."""
# Define the prompts
extract_keyphrases_prompt = TEXT_EXTRACT_KEYPHRASES.format(text=text)
generate_questions_prompt = TEXT_GENERATE_QUESTIONS.format(
text=text,
keyphrases=[
"JPMorgan Chase & Co.",
"American multinational finance company",
"headquartered in New York City",
"largest bank in the United States",
"world's largest bank by market capitalization",
"founded in 1799",
"major provider of investment banking services",
"US$3.9 trillion in total assets",
"ranked #1 in Forbes Global 2000 ranking",
],
)
generate_answers_prompt = TEXT_GENERATE_ANSWERS.format(
summary="""JPMorgan Chase & Co., headquartered in New York City, is the largest bank in the US and the world's largest by market capitalization as of 2023. Founded in 1799, it offers extensive investment, private, asset management, and retail banking services, and has $3.9 trillion in assets, making it the fifth-largest bank globally. It operates the world's largest investment bank by revenue and was ranked #1 in the 2023 Forbes Global 2000.""",
questions=[
"Is JPMorgan Chase & Co. an American multinational finance company?",
"Is JPMorgan Chase & Co. headquartered in New York City?",
"Is JPMorgan Chase & Co. the largest bank in the United States?",
"Is JPMorgan Chase & Co. the world's largest bank by market capitalization as of 2023?",
"Was JPMorgan Chase & Co. founded in 1799?",
"Is JPMorgan Chase & Co. a major provider of investment banking services?",
"Does JPMorgan Chase & Co. have US$3.9 trillion in total assets?",
"Was JPMorgan Chase & Co. ranked #1 in the Forbes Global 2000 ranking in 2023?",
],
)
# Assuming you have an LLM instance to generate the output
# llm = YourLLMInstance()
# keyphrases_response = llm.generate(extract_keyphrases_prompt)
# questions_response = llm.generate(generate_questions_prompt)
# answers_response = llm.generate(generate_answers_prompt)
# Parse the responses
# keyphrases = RagasoutputParser(pydantic_object=ExtractKeyphrasesResponse).parse(keyphrases_response)
# questions = RagasoutputParser(pydantic_object=GenerateQuestionsResponse).parse(questions_response)
# answers = RagasoutputParser(pydantic_object=GenerateAnswersResponse).parse(answers_response)
This code sets up the prompts for extracting key phrases, generating questions, and generating answers. You would need to use an LLM instance to generate the output and then parse the responses using RagasoutputParser
. This approach ensures that the generated data changes sentence by sentence according to specific sentence structures [1].
To continue talking to Dosu, mention @dosu.
@dosu is there a way a customize the prompt using TestsetGenerator.with_openai() and generator.generate_with_langchain_docs()?
do keep a track of #1016 will make testset generation more customizable there 🙂
Does anyone know how to prompt data generation? I would need specific sentence structures. I tried to reconfigure the prompt itself, but it did not change the generated data sentence by sentence. Does anyone know how to implement this correctly?