AISG-Technology-Team / GCSS-Track-1A-Submission-Guide

Submission Guide + Discussion Board for AI Singapore Global Challenge for Safe and Secure LLMs (Track 1A).
16 stars 2 forks source link

[Team-Badass] - [charlesli2002] - [Regarding evaluation protocal] #7

Closed Charles20021201 closed 2 months ago

Charles20021201 commented 2 months ago

Hi,

thanks for holding such a wonderful contest! I am having great fun participating in it.

My last submission achieves an ASR of 40.00 for Llama2 and 48.00 for Vicuna.

However, in my local test environment, where a Llama2-13B from HarmBench is used as the evaluator and greedy decoding is used for the two LLMs, the ASR is 88.00 and 90.00 respectively for the 50 behaviors provided.

The transferability of the baseline GCG from Llama2 to Vicuna is also much lower than expected (only 16.00), given that Vicuna is a model extremely vulnerable to attack.

My main question is that the two models seem much more robust than we thought. So are there any defenses/tricks not mentioned in the game rules being applied to the models? Also, is the evaluator GPT-4?

Please forgive me if my questions are against the rules and all responses would be appreciated.

Lastly, thanks for your effort in holding this contest!

Best, Charles

aisg-py commented 2 months ago

Hi @Charles20021201 ,

Thank you for participating in this challenge.

Regarding your query, please be assured that the models used in the competition are exactly as specified in the Submission Guide.

At this time, we're unable to provide specific details about the Judge LLM to ensure the integrity and fairness of the competition. We appreciate your understanding.

Charles20021201 commented 2 months ago

Hi,

Thanks for your quick reply!

I am fully aware that the weights are exactly the same.

But would you be so kind as to share some information on the question that are there any defenses/tricks not mentioned in the game rules being applied to the models?

Best, Charles

mtmak-aisg commented 2 months ago

@Charles20021201 , please do not double post. If you intend to ping someone directly, kindly edit your comment and add the ping.

Charles20021201 commented 2 months ago

@Charles20021201 , please do not double post. If you intend to ping someone directly, kindly edit your comment and add the ping.

Ohh sorry for that.

aisg-py commented 2 months ago

Hi @Charles20021201 , All essential information required for successful participation in the challenge has already been provided on the challenge portal and submission guide.

Charles20021201 commented 2 months ago

Hi @Charles20021201 , All essential information required for successful participation in the challenge has already been provided on the challenge portal and submission guide.

Hi, thanks for your reply! I have no other questions.