API and rules for checking the password

NivC commented 11 months ago

Dear SPY Lab,

Thank you very much for organizing this competition!

A small question please: What are the API and rules for checking the password? Does the attacker have to give a correct password in the first try to qualify? If not, how many tries does the attacker have?

Also, could you clarify please the part about "Anything that puts the secret completely out of scope"? (under filters goal and scope)

Best, Niv.

dpaleka commented 11 months ago

Thanks for taking part in our competition!

Also, could you clarify please the part about "Anything that puts the secret completely out of scope"? (under filters goal and scope)

The discussion in https://github.com/ethz-spylab/satml-llms-ctf-issues/issues/13 contains an example of a defense that is disallowed by rules in that section, including the rule you mention.

Does the attacker have to give a correct password in the first try to qualify? If not, how many tries does the attacker have?

Here are the preliminary rules (blue means not determined yet):

We will publish the full attack rules and the API before the Attack phase commences. The main scientific merit of this competition is figuring out whether a utility-preserving defense (of the specified form) exists that is hard to break. Eliciting this via scoring mechanisms in the Attack phase is not completely straightforward, for various reasons. Our concerns are highly dependent on the number of good defenses submitted, so this part of the rules can only be fixed after the Defense phase deadline.

What are the API and rules for checking the password?

I'm not sure what this means.

NivC commented 11 months ago

Thanks for the response! Let's say I know the secret in one of 5 strings that I have. In that case, I know a lot about the right password, but I will still probably not get it correctly on the first try. Please note that this number of tries is different from the number of interactions with the LLM. Niv.

dedeswim commented 11 months ago

You will have limited a number of attempts > 1 to guess what the correct secret is, both for the reconnaissance and the evaluation phases

NivC commented 11 months ago

Thanks!

ethz-spylab / satml-llm-ctf

API and rules for checking the password #19