IBM / Dromedary

Dromedary: towards helpful, ethical and reliable LLMs.
GNU General Public License v3.0
1.11k stars 86 forks source link

Topic-guided red-teaming: issues with prompt/examples #10

Closed sanderland closed 1 year ago

sanderland commented 1 year ago
  1. Topic-guided red-teaming

I am trying to understand how this step works, and some differences between what the paper says and what the prompt implies.

The paper seems to imply this step is mainly for guiding instruction generation along topics, including difficult to answer ones, but the prompt explicitly asks for all instructions to be of the type

a machine learning model can't answer, or will answer with the wrong facts.

However, the examples are a mix, with e.g. the following ones being quite reasonable to expect a model to be able to answer nowadays.

type: Instructions that require historical knowledge, topic: Battle of Waterloo, instruction: What was the significance of the Battle of Waterloo in European history? type: Instructions that require technology knowledge, topic: Storage, instruction: What is the difference between a solid-state drive (SSD) and a hard disk drive (HDD)?

Could you give me some background on what led to the prompt being based on all "impossible" questions?

  1. Balance of vanilla vs topic-based How many vanilla and how many topic-based samples did you generate, and are these datasets available anywhere?

  2. Count in prompt This is fairly minor, but the prompt has

    * 20 Hints:
    * 20 Instructions:

    But you run this with 10 items, correct?

Thank you for any clarifications.

Edward-Sun commented 1 year ago

Hi Sander,

Could you give me some background on what led to the prompt being based on all "impossible" questions?

Yes. Some questions are actually hard to answer, but not impossible to answer.

How many vanilla and how many topic-based samples did you generate, and are these datasets available anywhere?

The merged dataset is available here. We have 267,597 synthetic prompts from Self-Instruct, and 99,121 synthetic prompts from TGRT Self-Instruct.

But you run this with 10 items, correct?

Yes. Since the language model is generating prompts autoregressively. This number can be any number greater than 10.