Closed pearlmary closed 1 year ago
Hi Unispac Any instructions on how to use diffpure for adversarial cleaning?
Hi, thanks for your interest in our work! As diffpure method is not our original contribution, we didn't include it in our repository. Basically, one can just use the implementation of the original paper to do similar things.
Yet, we understand it would be better to release the version we are using to best reimplement our results. We will incorporate it in our next iteration. Thanks for your patience!
We will incorporate it in our next iteration. Thanks for your patience!
Thank you for considering.
Also, if you can include the exact percentages +/- standard deviation, as mentioned in the paper, it would be great: "As there is randomness in the generation, we sample 3 continuations for each input. The threshold is set to 0.5 and the frequencies of the generated texts that exceed the threshold for each toxicity attribute are reported in Table 2. We also report the standard deviations of the estimators across the 3 independent samplings."
Thank You.
Hi, Your work is super amazing!!!
I searched the entire repository, but I could not find the code for "Analyzing Defenses against the Attacks." . It seems DiffPure is missing.
Can you add it in the repository if possible? or can you guide me how to make it happen?
Thanks in Advance.