explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.5k stars 636 forks source link

Custom prompt for test set generation #1121

Open harvey1992 opened 1 month ago

harvey1992 commented 1 month ago

Is there a way to customize a prompt using the TestsetGenerator class? I looked at the documentation, and it only provides examples with default prompts used under the hood. I also looked at the code associated with the test generation class, but I didn't understand how to do it, if it is possible.

dosubot[bot] commented 1 month ago

Hello @harvey1992! I'm here to assist you with any questions you have and help resolve any issues you encounter. If you need any assistance, feel free to ask.

Yes, you can customize a prompt using the TestsetGenerator class by defining a Prompt object with specific instructions, examples, input keys, output key, output type, and language. Here is an example of how a prompt is customized:

from ragas.llms.prompt import Prompt

reasoning_question_prompt = Prompt(
    name="reasoning_question",
    instruction="""Complicate the given question by rewriting question into a multi-hop reasoning question based on the provided context.
    Answering the question should require the reader to make multiple logical connections or inferences using the information available in given context.
    Rules to follow when rewriting question:
    1. Ensure that the rewritten question can be answered entirely from the information present in the contexts.
    2. Do not frame questions that contains more than 15 words. Use abbreviation wherever possible.
    3. Make sure the question is clear and unambiguous.
    4. phrases like 'based on the provided context','according to the context',etc are not allowed to appear in the question.""",
    examples=[
        {
            "question": "What is the capital of France?",
            "context": "France is a country in Western Europe. It has several cities, including Paris, Lyon, and Marseille. Paris is not only known for its cultural landmarks like the Eiffel Tower and the Louvre Museum but also as the administrative center.",
            "output": "Linking the Eiffel Tower and administrative center, which city stands as both?",
        },
        {
            "question": "What does the append() method do in Python?",
            "context": "In Python, lists are used to store multiple items in a single variable. Lists are one of 4 built-in data types used to store collections of data. The append() method adds a single item to the end of a list.",
            "output": "If a list represents a variable collection, what method extends it by one item?",
        },
    ],
    input_keys=["question", "context"],
    output_key="output",
    output_type="str",
    language="english",
)

This example shows how to create a prompt for rewriting questions into multi-hop reasoning questions. You can customize the instruction, examples, input_keys, output_key, output_type, and language to fit your specific needs.

Additionally, you can use the TestsetGenerator class to generate synthetic test data with different types of questions, such as reasoning, conditioning, and multi-context questions. Here is an example of how to customize the test data distribution:

from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# documents = load your documents

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# Change resulting question type distribution
distributions = {
    simple: 0.5,
    multi_context: 0.4,
    reasoning: 0.1
}

# use generator.generate_with_llamaindex_docs if you use llama-index as document loader
testset = generator.generate_with_langchain_docs(documents, 10, distributions) 
testset.to_pandas()

This example demonstrates how to customize the distribution of different question types in the generated test data [1][2].

To continue talking to Dosu, mention @dosu.

harvey1992 commented 1 month ago

@dosu you didn't actually use the custom prompt in the test generation.

dosubot[bot] commented 1 month ago

To integrate the custom prompt into the TestsetGenerator for generating test sets, you need to define the prompt and then use it within the TestsetGenerator. Here is an example of how you can integrate the custom prompt:

  1. Define the Custom Prompt: Ensure the prompt is defined as shown below.
  2. Integrate into TestsetGenerator: Use the defined prompt within the TestsetGenerator.

Here is an example code snippet:

from ragas.llms.prompt import Prompt
from ragas.testset.generator import TestsetGenerator

# Define the custom prompt
custom_prompt = Prompt(
    name="custom-prompt",
    instruction="Create one or more statements from each sentence in the given answer.",
    examples=[
        {
            "question": "Cadmium Chloride is slightly soluble in this chemical, it is also called what?",
            "answer": "alcohol",
            "statements in json": """{
                                        "statements": [
                                        "Cadmium Chloride is slightly soluble in alcohol."
                                        ]
                                    }""",
        },
        {
            "question": "Were Hitler and Benito Mussolini of the same nationality?",
            "answer": "Sorry, I can't provide answer to that question.",
            "statements in json": """{
                                        "statements": []
                                    }""",
        },
    ],
    input_keys=["question", "answer"],
    output_key="statements in json"
)

# Initialize the TestsetGenerator with the custom prompt
testset_generator = TestsetGenerator(prompt=custom_prompt)

# Generate test sets using the custom prompt
test_sets = testset_generator.generate_test_sets()

This code assumes that the TestsetGenerator class has a method generate_test_sets that uses the provided prompt to generate test sets. If the TestsetGenerator class or its methods differ, you may need to adjust the integration accordingly [1][2].

To continue talking to Dosu, mention @dosu.

jjmachan commented 1 month ago

@harvey1992 as of today customizing the testset generation is hard. You will have to reimplement the evaluations.

what modifications were you planning to do though?

harvey1992 commented 1 month ago

@jjmachan The default instructions for prompting test generation make the LLM generate questions that would be written on an exam. I wanted to modify the instructions so they would output questions closer to what an actual human might write.

jjmachan commented 1 month ago

@harvey1992 gotcha!

you can actually easily modify the prompts. all you have to do is create a new Prompt which instructions and then replace simple.seed_question_prompt with it. You can do that for the other evolutions too.

let me know if this solves the issue?

harvey1992 commented 1 month ago

could you show me a simple example?

jjmachan commented 1 month ago

I'm not sure about your usecase but something you could try is

from ragas.testset.evolutions import simple

simple.seed_question_prompt

and then use Prompt Objects | Ragas to create the new one

If you want we can jump on a call to better understand your usecase and help you out too

atr-ip commented 1 month ago

@jjmachan I have tried it and it alone is not enough to change the generated response. Will it be possible in the future to generate your own question answer pair with your own prompts for a given context?

here is my example:

from ragas.llms.prompt import Prompt
from ragas.testset.evolutions import simple

own_seed_question_prompt = Prompt(
    name="seed_question",
    instruction="Generate a question that can be fully answered from given context. The question should be formed using topic",
    examples=[
        {
            "context": "Goal The XY was set out to replace another system, ABC. Right now it is only available to perform a partial role of the original system, the visualization of input data. It is out of scope for XY to perform further operational tasks.",
            "keyphrase": "XY",
            "question": "What was the initial purpose of XY project?",
        },
        {
            "context": " 8904 William Shakespeare 8905 Jane Austen    8983 Mark Twain  8916 Charles Dickens   8918 Leo Tolstoy    8920 Fyodor Dostoevsky  8921 George Orwell  8969 Ernest Hemingway  8974 Franz Kafka   8977 J. R. R. Tolkien   8978 Oscar Wilde 8989 Alexander Pushkin    8994 Victor Hugo   8995 John Steinbeck  8996 Walt Whitman    8997 Charles Baudelaire   8998 Virginia Woolf",
            "keyphrase": "Alexander Pushkin",
            "question": "What is Alexander Pushkins' port?",
        },
        {
            "context": "MaximumLinesWritten float Maximum number of lines written in a single poem Lines/Day MinutesOfInspiration float Sum of the durations of all individual writing sessions in the BIN and the period of 15 minutes. NumberOfPoems int Number of poems completed RejectionReasons string Reasons for the rejection of poems by publishers. The reason is given as text with associated number to show how many occurrences are seen in the BIN for the period of 15 minutes. BlockReasons string Reasons for the writer's block. The reason is given as text with associated number to show how many occurrences are seen in the BIN for the period of 15 minutes. StationaryMinutesOfInspiration float The duration of time measured in minutes in which the poet has been classified to be stationary and inspired for the BIN and the period of 15 minutes. Minutes",
            "keyphrase": "metaphor usage",
            "question": "In poetry analysis, which metric determines the frequency of metaphor usage in a given poem?",
        },
    ],
    input_keys=["context", "keyphrase"],
    output_key="question",
    output_type="str"
)

simple.seed_question_prompt = own_seed_question_prompt
jjmachan commented 1 month ago

hey @atr-ip, I think I'm not quite understanding the question. Today you can do what you have mentioned above. The simple evolution has the prompt you specified and the generation process will go along as you instructed in the prompt.

or is it not working as such and you are facing some other problem?

from what I understand the root concern is that you need more control over the testsets generated. This is something we are hoping to address with #1016, do keep a tab on that one 🙂

harvey1992 commented 1 month ago

@jjmachan I am doing something similar to this from the documentation

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# Change resulting question type distribution
distributions = {
    simple: 0.5,
    multi_context: 0.4,
    reasoning: 0.1
}

# use generator.generate_with_llamaindex_docs if you use llama-index as document loader
testset = generator.generate_with_langchain_docs(documents, 10, distributions) 
testset.to_pandas()

I am currently using the default instructions to generate this dataset. However, I would like to slightly modify how the questions are generated based on their type (simple, multi-context, reasoning). You mentioned usingsimple.seed_question_prompt, but I'm unsure how to proceed since I can't pass my custom prompt to the class in any way.

jjmachan commented 1 month ago

@harvey1992 mixing what @atr-ip wrote above and your code, you could try something like

from ragas.llms.prompt import Prompt
from ragas.testset.evolutions import simple

own_seed_question_prompt = Prompt(
    name="seed_question",
    instruction="Generate a question that can be fully answered from given context. The question should be formed using topic",
    examples=[
        {
            "context": "Goal The XY was set out to replace another system, ABC. Right now it is only available to perform a partial role of the original system, the visualization of input data. It is out of scope for XY to perform further operational tasks.",
            "keyphrase": "XY",
            "question": "What was the initial purpose of XY project?",
        },
        {
            "context": " 8904 William Shakespeare 8905 Jane Austen    8983 Mark Twain  8916 Charles Dickens   8918 Leo Tolstoy    8920 Fyodor Dostoevsky  8921 George Orwell  8969 Ernest Hemingway  8974 Franz Kafka   8977 J. R. R. Tolkien   8978 Oscar Wilde 8989 Alexander Pushkin    8994 Victor Hugo   8995 John Steinbeck  8996 Walt Whitman    8997 Charles Baudelaire   8998 Virginia Woolf",
            "keyphrase": "Alexander Pushkin",
            "question": "What is Alexander Pushkins' port?",
        },
        {
            "context": "MaximumLinesWritten float Maximum number of lines written in a single poem Lines/Day MinutesOfInspiration float Sum of the durations of all individual writing sessions in the BIN and the period of 15 minutes. NumberOfPoems int Number of poems completed RejectionReasons string Reasons for the rejection of poems by publishers. The reason is given as text with associated number to show how many occurrences are seen in the BIN for the period of 15 minutes. BlockReasons string Reasons for the writer's block. The reason is given as text with associated number to show how many occurrences are seen in the BIN for the period of 15 minutes. StationaryMinutesOfInspiration float The duration of time measured in minutes in which the poet has been classified to be stationary and inspired for the BIN and the period of 15 minutes. Minutes",
            "keyphrase": "metaphor usage",
            "question": "In poetry analysis, which metric determines the frequency of metaphor usage in a given poem?",
        },
    ],
    input_keys=["context", "keyphrase"],
    output_key="question",
    output_type="str"
)

simple.seed_question_prompt = own_seed_question_prompt # now the prompt has been changed

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# Change resulting question type distribution
distributions = {
    simple: 0.5,
    multi_context: 0.4,
    reasoning: 0.1
}

# use generator.generate_with_llamaindex_docs if you use llama-index as document loader
testset = generator.generate_with_langchain_docs(documents, 10, distributions) 
testset.to_pandas()

let me know if that works?

harvey1992 commented 1 month ago

@jjmachan i modified the open-source code and it made it work for my needs. thanks

jjmachan commented 1 month ago

@harvey1992 glad you are unblock but do keep a tab one #1016 , will be improving customization more there

fschuh commented 1 month ago

@harvey1992 mixing what @atr-ip wrote above and your code, you could try something like

from ragas.llms.prompt import Prompt
from ragas.testset.evolutions import simple

own_seed_question_prompt = Prompt(
    name="seed_question",
    instruction="Generate a question that can be fully answered from given context. The question should be formed using topic",
    examples=[
        {
            "context": "Goal The XY was set out to replace another system, ABC. Right now it is only available to perform a partial role of the original system, the visualization of input data. It is out of scope for XY to perform further operational tasks.",
            "keyphrase": "XY",
            "question": "What was the initial purpose of XY project?",
        },
        {
            "context": " 8904 William Shakespeare 8905 Jane Austen    8983 Mark Twain  8916 Charles Dickens   8918 Leo Tolstoy    8920 Fyodor Dostoevsky  8921 George Orwell  8969 Ernest Hemingway  8974 Franz Kafka   8977 J. R. R. Tolkien   8978 Oscar Wilde 8989 Alexander Pushkin    8994 Victor Hugo   8995 John Steinbeck  8996 Walt Whitman    8997 Charles Baudelaire   8998 Virginia Woolf",
            "keyphrase": "Alexander Pushkin",
            "question": "What is Alexander Pushkins' port?",
        },
        {
            "context": "MaximumLinesWritten float Maximum number of lines written in a single poem Lines/Day MinutesOfInspiration float Sum of the durations of all individual writing sessions in the BIN and the period of 15 minutes. NumberOfPoems int Number of poems completed RejectionReasons string Reasons for the rejection of poems by publishers. The reason is given as text with associated number to show how many occurrences are seen in the BIN for the period of 15 minutes. BlockReasons string Reasons for the writer's block. The reason is given as text with associated number to show how many occurrences are seen in the BIN for the period of 15 minutes. StationaryMinutesOfInspiration float The duration of time measured in minutes in which the poet has been classified to be stationary and inspired for the BIN and the period of 15 minutes. Minutes",
            "keyphrase": "metaphor usage",
            "question": "In poetry analysis, which metric determines the frequency of metaphor usage in a given poem?",
        },
    ],
    input_keys=["context", "keyphrase"],
    output_key="question",
    output_type="str"
)

simple.seed_question_prompt = own_seed_question_prompt # now the prompt has been changed

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# Change resulting question type distribution
distributions = {
    simple: 0.5,
    multi_context: 0.4,
    reasoning: 0.1
}

# use generator.generate_with_llamaindex_docs if you use llama-index as document loader
testset = generator.generate_with_langchain_docs(documents, 10, distributions) 
testset.to_pandas()

let me know if that works?

I use a similar solution to customize some of the prompts to get them working with less powerful LLMs that I run locally.
However, just updating the internal Ragas prompt variables isn't enough - it's also needed to reload the Ragas testset modules, because the original prompts have already been imported by Ragas code.

Here's what I do to reload the modules and have my custom prompts kick in:

import importlib
import ragas.testset.prompts as ragas_prompts
import ragas.testset.filters
import ragas.testset.evolutions
import ragas.testset.extractor
import ragas.testset.generator

# declare all custom ragas prompts
keyphrase_extraction_prompt = Prompt(
    name="keyphrase_extraction",
    instruction="Extract the top 3 to 5 keyphrases from the provided text, focusing on the most significant and distinctive aspects. Only output JSON, and nothing else.",
    examples=[
        {
            "text": "A black hole is a region of spacetime where gravity is so strong that nothing, including light and other electromagnetic waves, has enough energy to escape it. The theory of general relativity predicts that a sufficiently compact mass can deform spacetime to form a black hole.",
            "output": {
                "keyphrases": [
                    "Black hole",
                    "Region of spacetime",
                    "Strong gravity",
                    "Light and electromagnetic waves",
                    "Theory of general relativity",
                ]
            },
        },
        {
            "text": "The Great Wall of China is an ancient series of walls and fortifications located in northern China, built around 500 years ago. This immense wall stretches over 13,000 miles and is a testament to the skill and persistence of ancient Chinese engineers.",
            "output": {
                "keyphrases": [
                    "Great Wall of China",
                    "Ancient fortifications",
                    "Northern China",
                ]
            },
        },
    ],
    input_keys=["text"],
    output_key="output",
    output_type="json",
)

void apply_custom_prompts():
    # override ragas prompts here
    ragas_prompts.keyphrase_extraction_prompt = keyphrase_extraction_prompt

    importlib.reload(ragas.testset.filters)
    importlib.reload(ragas.testset.evolutions)
    importlib.reload(ragas.testset.extractor)
    importlib.reload(ragas.testset.generator)
    # from this point onwards, Ragas will use our custom prompts

This still feels like a brittle hack though, as these internal variables could change and/or disappear in future versions of Ragas.
It would be nice to have the public API allow custom prompts in some form, without having to resort to fiddling with the Ragas internals.

harvey1992 commented 1 month ago

@fschuh I updated the evolution file and generator file to allow custom prompts to be passed via arguments.

CosaroLisa commented 1 month ago

@harvey1992

@fschuh I updated the evolution file and generator file to allow custom prompts to be passed via arguments.

Could you give a simple example of that ?

harvey1992 commented 1 month ago

@CosaroLisa I don't want to post a ton of code here, but in the evolution file, i created a setter to set the prompt for the given evolution type and then i added the initialized piece in the init_evolution method in the generator class.

tomi-neuron commented 3 weeks ago

hi, @jjmachan is my understanding correct, that if I want to finetune the generator for a specific domain, then it is enough to change the simple.seed_question_prompt prompt examples to my custom ones? Since this is the "seed" from which the other type of questions (reasoning, conditional, multi-context) are evolved (and if i'm not mistaken, this is also the question for the "simple" question type), is this correct? So, is changing the simple.seed_question_prompt sufficient and the right way to go?

I have a set of ground truth question/answer/context triplets and would like to use this as a few-shot examples for the generator so that the original seed question from which the other questions are rephrased/evolved are closer to my domain. However in the seed_question_prompt the keyphrases are also given in the examples. Is it possible to use the KeyphraseExtractor for this or better do it manually for the few-shot ground truth?

derekhsu commented 3 days ago

No, if you want to use a customized key phrase extractor, you can create your own DocumentStore and pass by docstore parameter.