Unispac / Visual-Adversarial-Examples-Jailbreak-Large-Language-Models

Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models
183 stars 16 forks source link

DiffPure is missing in the repository. #3

Closed pearlmary closed 1 year ago

pearlmary commented 1 year ago

Hi, Your work is super amazing!!!

I searched the entire repository, but I could not find the code for "Analyzing Defenses against the Attacks." . It seems DiffPure is missing.

Can you add it in the repository if possible? or can you guide me how to make it happen?

Thanks in Advance.

pearlmary commented 1 year ago

Hi Unispac Any instructions on how to use diffpure for adversarial cleaning?

Unispac commented 1 year ago

Hi, thanks for your interest in our work! As diffpure method is not our original contribution, we didn't include it in our repository. Basically, one can just use the implementation of the original paper to do similar things.

Yet, we understand it would be better to release the version we are using to best reimplement our results. We will incorporate it in our next iteration. Thanks for your patience!

pearlmary commented 1 year ago

We will incorporate it in our next iteration. Thanks for your patience!

Thank you for considering.

Also, if you can include the exact percentages +/- standard deviation, as mentioned in the paper, it would be great: "As there is randomness in the generation, we sample 3 continuations for each input. The threshold is set to 0.5 and the frequencies of the generated texts that exceed the threshold for each toxicity attribute are reported in Table 2. We also report the standard deviations of the estimators across the 3 independent samplings."

Thank You.