Princeton-SysML / Jailbreak_LLM

153 stars 14 forks source link

(Maybe) Bugs in the code. #3

Open Junjie-Chu opened 12 months ago

Junjie-Chu commented 12 months ago

I find in the code attack.py, you use:

if "falcon" in fname or "mpt" in fname:

However, before the if, you have:

if args.use_system_prompt:
   fname += "_with_sys_prompt"

This means, if you use the option --Vicuna --use_system_prompt, the code will run in the mpt branch.

I believe mpt means mpt-7b, so this may not be the results we want.

This problem may cause wrong evaluation of results with/without system prompts.

Hazelsuko07 commented 11 months ago

Hi Junjie,

Thanks for reporting this issue (and that is a great catch)! I've fixed it.

I'd like to reassure you that this correction will not impact our results. The disparities between the mpt/falcon model branch and the others are twofold:

  1. mpt/falcon necessitates trust_remote_code to be set as True. For other models, which are Vicuna and llama models in our evaluation, enabling trust_remote_code does not change the model behavior.
  2. mpt/falcon don't support the generation of outputs from sentence embeddings (what we did for other models), therefore, we opted to generate directly from tokenized sentences.

Neither of these distinctions will affect the attack's performance.