Open alexHeu opened 4 days ago
Hey @alexHeu! I'm a bot designed to assist you with bugs, questions, and contributions. How can I help you today?
Yes, it is possible to create a synthetic dataset for function calling tasks to use in a finetuning experiment. Ragas provides tools for synthetic test data generation, which can be applied to function calling tasks. Here is a relevant code snippet for generating a synthetic test dataset:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
TEST_SIZE = 25
# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()
generator = TestsetGenerator.from_langchain(generator_llm, critic_llm, embeddings)
# set question type distribution
distribution = {simple: 0.5, reasoning: 0.25, multi_context: 0.25}
# generate testset
testset = generator.generate_with_llamaindex_docs(
documents, test_size=TEST_SIZE, distributions=distribution
)
test_df = testset.to_pandas()
test_df.head()
Ragas employs an evolutionary generation paradigm to create diverse and complex questions from provided documents. This approach ensures comprehensive coverage of various question types, including reasoning, conditioning, and multi-context, which are crucial for robust evaluation and finetuning experiments [1][2].
To continue talking to Dosu, mention @dosu.
Hey, I was wondering if you think it would be possible to create a synthetic dataset for function calling tasks? I would like to use that dataset for a finetuning experiment.
Thanks for any guidance that you might have!