Closed payphone131 closed 5 days ago
Hello @payphone131,
Our dataset generation pipeline involves multiple calls to the OpenAI API. Due to the potential harmfulness of the generated instructions, there could be occasional refusals. If the refusal rate is too high, you might consider the following methods:
Additionally, you can directly use our published instructions.
thanks for your reply. your reply is really useful. btw, have you noticed that the ASR calculated by beaver-7b always changes? specifically, i found the "flag_num" in line 148 of evaluate_for_black_box.py changes every time i run the code. is this normal?
We also observed this phenomenon in our experiment. Since Beaver-dam-7B is a language model, it may generate different responses to the same query. However, we found that the variation is acceptable. We also examined some inputs that caused variations in the model's responses and found that their harmfulness is indeed rather ambiguous.
Thank you for proposing a really interesting work. I would like to know whether you have noticed that the instructions generated using generate_instructions.py contain many refusals like "Sorry, but I can't assist with that.", should i ignore these refusals and continue to run amplifying_toxic.sh or do something to handle these refusals? Looking forward to your reply.