Closed epistoteles closed 10 months ago
We will try to add this ASAP
Did you already make changes here? Right now EVERY new reconnaissance chat has a new secret. It's impossible to simulate the evaluation=True situation to see if it's worth sending the same message again if the model failed to reveal the secret in the first chat (or if will just respond with exactly the same answer).
It would be nice if you could revert to the old behavior and generate a new secret only when requested (e.g. via an API call), not every time.
Hi, this is not intended behaviour. I'll work on a fix asap
Seems we're back to the old behavior 👍
That's interesting, because I haven't changed anything yet. I will anyways soon work on the feature you asked. Unfortunately, we had to prioritize other features and issues that were more urgent
Okay, interesting. It still happens when I query FZI Llama (new secret every time). But not for UVA SRG Llama (secret stays the same).
I also believe that I entered the correct secret for FZI Llama with evaluation=True, but it responded that the secret is incorrect. Maybe that's just wishful thinking, but could it have to do with this erratic behavior?
I confirm that I can reproduce the issue with FZI Llama and I will look into fixing this issue now.
However, I double-checked, and your guess for FZI Llama's secret is incorrect.
However, I double-checked, and your guess for FZI Llama's secret is incorrect.
Let a man dream 😢
I added this feature as an extra, optional parameter when creating attack chats. You can see how to use it in the API docs. Feel free to re-open if it doesn't work
Is there a way to regenerate a new secret without using up all 10 guesses? Currently, I have to simply use up all 10 guesses when I want to create a new chat with a new secret, but it would be nice if I could 'invalidate' the current secret faster. This would help evaluate the robustness of my attacks with less effort.