ethz-spylab / satml-llm-ctf

Code used to run the platform for the LLM CTF colocated with SaTML 2024
https://ctf.spylab.ai
MIT License
25 stars 6 forks source link

A new secret is generated for every chat for some defense submissions #60

Closed dedeswim closed 8 months ago

dedeswim commented 10 months ago
          Did you already make changes here? Right now EVERY new reconnaissance chat has a new secret. It's impossible to simulate the evaluation=True situation to see if it's worth sending the same message again if the model failed to reveal the secret in the first chat (or if will just respond with exactly the same answer). 

It would be nice if you could revert to the old behavior and generate a new secret only when requested (e.g. via an API call), not every time.

Originally posted by @epistoteles in https://github.com/ethz-spylab/satml-llms-ctf-issues/issues/47#issuecomment-1929469950

This happens for the following defenses:

dedeswim commented 10 months ago

@epistoteles have you made any guesses at all with FZI/Llama yet?

If not, can you please create a reconnaissance chat, make a guess and then create another chat and see if it's a different secret or still the same one?

The current behaviour is that creating a new reconnaissance chat is going to give you a new secret each time you create a chat if you haven't made any guesses for that submission yet. However, after you made your first guess, creating a chat should be giving you the same secret until you guess correctly or exhaust the guesses.

This is indeed a bit of a weird behavior, but we believe it's not a major bug, and changing the current logic may disrupt the overall chat creation, so we won't be fixing this for now given the sensitive stage of the competition.

However, if you do observe a behavior different than the one I described above, let me know.