Closed caesar-jojo closed 1 year ago
Hi, the simplest (but probably not the best performant way) of generating dishonest responses is to use a negative coefficient in the control portion of the honesty.ipynb notebook.
@caesar-jojo Hi, we have just added an example honesty_mistral that generate dishonest response with mistralai/Mistral-7B-Instruct-v0.1
Hello, author, how can I generate dishonest responses?