Topic-guided red-teaming: issues with prompt/examples

Topic-guided red-teaming

I am trying to understand how this step works, and some differences between what the paper says and what the prompt implies.

The paper seems to imply this step is mainly for guiding instruction generation along topics, including difficult to answer ones, but the prompt explicitly asks for all instructions to be of the type

a machine learning model can't answer, or will answer with the wrong facts.

However, the examples are a mix, with e.g. the following ones being quite reasonable to expect a model to be able to answer nowadays.

type: Instructions that require historical knowledge, topic: Battle of Waterloo, instruction: What was the significance of the Battle of Waterloo in European history? type: Instructions that require technology knowledge, topic: Storage, instruction: What is the difference between a solid-state drive (SSD) and a hard disk drive (HDD)?

Could you give me some background on what led to the prompt being based on all "impossible" questions?

Balance of vanilla vs topic-based How many vanilla and how many topic-based samples did you generate, and are these datasets available anywhere?
Count in prompt This is fairly minor, but the prompt has
```
* 20 Hints:
* 20 Instructions:
```
But you run this with 10 items, correct?

Thank you for any clarifications.

IBM / Dromedary

Topic-guided red-teaming: issues with prompt/examples #10