Create new secret without using up all guesses

ethz-spylab / satml-llm-ctf

Code used to run the platform for the LLM CTF colocated with SaTML 2024

https://ctf.spylab.ai

MIT License

25 stars 6 forks source link

Create new secret without using up all guesses #47

Closed epistoteles closed 10 months ago

epistoteles commented 10 months ago

Is there a way to regenerate a new secret without using up all 10 guesses? Currently, I have to simply use up all 10 guesses when I want to create a new chat with a new secret, but it would be nice if I could 'invalidate' the current secret faster. This would help evaluate the robustness of my attacks with less effort.

dedeswim commented 10 months ago

We will try to add this ASAP

epistoteles commented 10 months ago

Did you already make changes here? Right now EVERY new reconnaissance chat has a new secret. It's impossible to simulate the evaluation=True situation to see if it's worth sending the same message again if the model failed to reveal the secret in the first chat (or if will just respond with exactly the same answer).

It would be nice if you could revert to the old behavior and generate a new secret only when requested (e.g. via an API call), not every time.

dedeswim commented 10 months ago

Hi, this is not intended behaviour. I'll work on a fix asap

epistoteles commented 10 months ago

Seems we're back to the old behavior 👍

dedeswim commented 10 months ago

That's interesting, because I haven't changed anything yet. I will anyways soon work on the feature you asked. Unfortunately, we had to prioritize other features and issues that were more urgent

epistoteles commented 10 months ago

Okay, interesting. It still happens when I query FZI Llama (new secret every time). But not for UVA SRG Llama (secret stays the same).

epistoteles commented 10 months ago

I also believe that I entered the correct secret for FZI Llama with evaluation=True, but it responded that the secret is incorrect. Maybe that's just wishful thinking, but could it have to do with this erratic behavior?

dedeswim commented 10 months ago

I confirm that I can reproduce the issue with FZI Llama and I will look into fixing this issue now.

However, I double-checked, and your guess for FZI Llama's secret is incorrect.

epistoteles commented 10 months ago

However, I double-checked, and your guess for FZI Llama's secret is incorrect.

Let a man dream 😢

dedeswim commented 10 months ago

I added this feature as an extra, optional parameter when creating attack chats. You can see how to use it in the API docs. Feel free to re-open if it doesn't work