OpenBMB / UltraFeedback

A large-scale, fine-grained, diverse preference dataset (and models).
MIT License
297 stars 16 forks source link

evol_instruct issues: prompts with missing data #12

Open sanderland opened 8 months ago

sanderland commented 8 months ago
ds = datasets.load_dataset('openbmb/ultrafeedback')
print(ds['train'][490]['instruction'])

Gives

Add a requirement for the given prompt that the hashtag must also include the top 3 countries with the highest sustainable energy consumption in 2020, based on their percentage of total energy consumption.

But there is no "given prompt". This seems to be an issue with several of the evol_instruct prompts. Also note that the completions for such samples include wild hallucinations, and ratings evaluating them as free of hallucinations.

In addition, even evol_instruct prompts that do include the prompt to be modified are often full of issues, with either the model or the evaluator interpreting it as a request to answer the original prompt.

lifan-yuan commented 8 months ago

Hi,

Thanks for pointing this out! We will check these samples immediately and get back to you after processing.

sanderland commented 7 months ago

These are some strings that are common in problematic prompts:

["Rewritten Prompt", "the given prompt and rewrite", "The Given Prompt"]

lifan-yuan commented 7 months ago

Thanks for your assistance!

I've meticulously inspected all these samples and found they are about prompt engineering. All models including the GPT-4 judge are not able to follow the instructions. Considering that these challenging instructions should be meaningful in examining models' instruction-following ability, we tend to manually rectify them rather than remove them from the dataset.

Currently, I am still striving to prompt the models, especially the GPT-4 judge, to understand the instructions, though little progress has been made. I'd appreciate it very much if anyone could help!