Closed Charles20021201 closed 2 months ago
Hi @Charles20021201 ,
Thank you for participating in this challenge.
Regarding your query, please be assured that the models used in the competition are exactly as specified in the Submission Guide.
At this time, we're unable to provide specific details about the Judge LLM to ensure the integrity and fairness of the competition. We appreciate your understanding.
Hi,
Thanks for your quick reply!
I am fully aware that the weights are exactly the same.
But would you be so kind as to share some information on the question that are there any defenses/tricks not mentioned in the game rules being applied to the models?
Best, Charles
@Charles20021201 , please do not double post. If you intend to ping someone directly, kindly edit your comment and add the ping.
@Charles20021201 , please do not double post. If you intend to ping someone directly, kindly edit your comment and add the ping.
Ohh sorry for that.
Hi @Charles20021201 , All essential information required for successful participation in the challenge has already been provided on the challenge portal and submission guide.
Hi @Charles20021201 , All essential information required for successful participation in the challenge has already been provided on the challenge portal and submission guide.
Hi, thanks for your reply! I have no other questions.
Hi,
thanks for holding such a wonderful contest! I am having great fun participating in it.
My last submission achieves an ASR of 40.00 for Llama2 and 48.00 for Vicuna.
However, in my local test environment, where a Llama2-13B from HarmBench is used as the evaluator and greedy decoding is used for the two LLMs, the ASR is 88.00 and 90.00 respectively for the 50 behaviors provided.
The transferability of the baseline GCG from Llama2 to Vicuna is also much lower than expected (only 16.00), given that Vicuna is a model extremely vulnerable to attack.
My main question is that the two models seem much more robust than we thought. So are there any defenses/tricks not mentioned in the game rules being applied to the models? Also, is the evaluator GPT-4?
Please forgive me if my questions are against the rules and all responses would be appreciated.
Lastly, thanks for your effort in holding this contest!
Best, Charles