BatsResearch / bonito

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
BSD 3-Clause "New" or "Revised" License
702 stars 46 forks source link

Getting diverse outputs when sampling n>1 #18

Closed eaubin closed 7 months ago

eaubin commented 8 months ago

Do you have any suggestions for generating diverse outputs for a context? I've tried different sampling parameters but the it seems that the generated sequences are always the same. E.g.

context_dataset= datasets.Dataset.from_list([{"context":"""The Space Shuttle program was the fourth human spaceflight program carried out by the U.S. National Aeronautics and Space Administration (NASA)"""}])
synthetic_dataset = bonito_model.generate_tasks(
    context_dataset,
    context_col="context",
    task_type="exqa",
    sampling_params=SamplingParams(max_tokens=256, top_p=0.5, temperature=1.1, n=3),
)
for r in synthetic_dataset:
    print(r)

will generate

{'input': 'Refer to the passage below and answer the following question:\n\nPassage: The Space Shuttle program was the fourth human spaceflight program carried out by the U.S. National Aeronautics and Space Administration (NASA)\n\nQuestion: What does NASA stand for?', 'output': 'National Aeronautics and Space Administration'}
{'input': 'Refer to the passage below and answer the following question:\n\nPassage: The Space Shuttle program was the fourth human spaceflight program carried out by the U.S. National Aeronautics and Space Administration (NASA)\n\nQuestion: What does NASA stand for?', 'output': 'National Aeronautics and Space Administration'}
{'input': 'Refer to the passage below and answer the following question:\n\nPassage: The Space Shuttle program was the fourth human spaceflight program carried out by the U.S. National Aeronautics and Space Administration (NASA)\n\nQuestion: What does NASA stand for?', 'output': 'National Aeronautics and Space Administration'}
nihalnayak commented 7 months ago

Thank you for catching this bug! I have created a PR to fix this bug.

Besides this bug, to increase the diversity of tasks, I suggest increasing the top_p value from 0.5 to 0.9 or 0.95 and using a slightly longer paragraph.