explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.58k stars 646 forks source link

Test data generation for function calling #1313

Open alexHeu opened 4 days ago

alexHeu commented 4 days ago

Hey, I was wondering if you think it would be possible to create a synthetic dataset for function calling tasks? I would like to use that dataset for a finetuning experiment.

Thanks for any guidance that you might have!

dosubot[bot] commented 4 days ago

Hey @alexHeu! I'm a bot designed to assist you with bugs, questions, and contributions. How can I help you today?

Yes, it is possible to create a synthetic dataset for function calling tasks to use in a finetuning experiment. Ragas provides tools for synthetic test data generation, which can be applied to function calling tasks. Here is a relevant code snippet for generating a synthetic test dataset:

from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

TEST_SIZE = 25

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(generator_llm, critic_llm, embeddings)

# set question type distribution
distribution = {simple: 0.5, reasoning: 0.25, multi_context: 0.25}
# generate testset
testset = generator.generate_with_llamaindex_docs(
    documents, test_size=TEST_SIZE, distributions=distribution
)
test_df = testset.to_pandas()
test_df.head()

Ragas employs an evolutionary generation paradigm to create diverse and complex questions from provided documents. This approach ensures comprehensive coverage of various question types, including reasoning, conditioning, and multi-context, which are crucial for robust evaluation and finetuning experiments [1][2].

To continue talking to Dosu, mention @dosu.