Closed ysy970923 closed 9 months ago
Hi @ysy970923 , thanks for submitting this question! You're right in that the feedback mechanism in this example is just the response from the target LLM.
More generally, we can envision cases where the score is very much required. Let's say our target LLM generates images (which we can't pass back into the LLM directly), then we need a score (or textual feedback) to be passed back to the Red Teaming LLM. As you can see, there are different kinds of setups, but we didn't want to overcomplicate the diagram either, so it just mentions feedback from the scoring engine.
I hope that helps! Please let us know if you have further thoughts on the topic otherwise we'll close the issue within the next 7 days.
Thanks so much for the response 👍
Are there plans for adding examples for image generation? If not can I contribute some examples for text to image models?
Yes, that's definitely relevant.
I'm not sure to what extent you already have these or are planning to work on that, but you can certainly open a PR if you already have it and we can comment there. If you're just starting out it's probably faster and simpler to write up a short outline and share (in a new issue since the original question was answered and is unrelated) so that you can get quick feedback before spending too much time on it. What do you think?
I did some work on this, so I made a pull request. Feel free to give comments :) Thank you
I have one question.
It seems that there is no mechanism for the calculated score to effect the red team bot.
But based on the image above, the scoring engine feedback is effecting the prompt generated.
Please let me know if I missed something. Thank you.