ethz-spylab / satml-llm-ctf

Code used to run the platform for the LLM CTF colocated with SaTML 2024
https://ctf.spylab.ai
MIT License
25 stars 6 forks source link

Reconnaissance phase knowledge #16

Closed epistoteles closed 11 months ago

epistoteles commented 11 months ago

In the reconnaissance phase, will the attackers know the ground truth of the secret S (or, alternatively, already have access to the judging endpoint)? Or will they just be able to interact with the defenses as they please, without knowing their true success rate?

Also, will defenses be identifiable by e.g. team name? Or will they be given random IDs? In other words, will it be possible to identify one's own model, if a team takes part both in the defender and attacker role? What is the policy on interacting with your own defense as attackers (as this is not "collaboration between teams" in the strict sense)?

dpaleka commented 11 months ago

We'll get back to you on the first question. Our plan was for the attackers not to know the secret, but for it to be fixed for each Reconnaissance endpoint (different secret for every defense). This is not set in stone just yet, neither in the rules nor the implementation. Are there important downsides to this setting?

In other words, will it be possible to identify one's own model, if a team takes part both in the defender and attacker role? What is the policy on interacting with your own defense as attackers (as this is not "collaboration between teams" in the strict sense)?

The defenses will have anonymized display IDs. The attacks are not scored based on breaking their own defense and vice versa; this means the main concern is to prevent teams wasting effort attacking their own defense. We will communicate which defense is yours by some display setting when logged in, or by returning an appropriate warning in the Reconnaissance API, or both.

dedeswim commented 11 months ago

Hi @epistoteles, you can find the updated rules with more details about the reconnaissance and evaluation phases on the website. TLDR: you can test your guess for the secret up to K times (currently set 10, but we could change it based on how many defences will be submitted).

I am closing the issue. Feel free to re-open it if this does not clarify your doubts.