Closed PinzhengWang322 closed 7 months ago
Hi, thanks for the great and meaningful work! When I using Fact-erasure defense, Input Rephrasing attack on gpt2-xl, I got this result.
saving csv at ..//results/gpt2-xl_ROME_outputs_cf_filt_editing_sweep_ws-[1]_layer-17_fact-erasure_no_margin_no_entropy_n700_t700_secondhalfFalse_analytical_solnFalse_top-4_grad_attk_lyr_mpFalse_parap_attack_4_mg_cfdefFalse_samp5.csv... results shape: (700, 55) final metrics: actual_retain_rate: 0.898 actual_retain_rate_n: 0.746 delta_accuracy: 0.012 delta_accuracy_neigh: 0.032 retain_rate_neighbor: 0.488 retain_rate_pre: 1.000 retain_rate_neighbor: 0.456 retain_rate: 0.988 attack_frac: 0.047 tgt_in_sample: 0.349 post_rewrite_success: 1.000 rewrite_prob_diff: -0.285 rewrite_post_prob: 0.008 rewrite_score: 0.930 post_paraphrase_succ: 0.999 paraphrase_prob_diff: -0.128 paraphrase_pre_prob: 0.153 paraphrase_post_prob: 0.024 paraphrase_score: 0.755 post_neighborhood_su: 0.000 neighborhood_prob_di: -0.017 neighborhood_score: 0.973 essence_ppl_diff: 3.271
And this is the script I used:
python3 -m experiments.evaluate_parap_attack \ --alg_name ROME \ --ds_name cf_filt \ --model_name gpt2-xl \ --run 1 \ --correctness_filter 1 \ --norm_constraint 1e-4 \ --kl_factor .0625 \ --gpu 2 \ --edit_layer 17 \ -n 700 \ --datapoints_to_execute 700 \ --num_attack_parap 4 \ --retain_rate \ --attack mg \ --bb_num_samples 5 \ --fact_erasure
This is inconsistent with the results in the paper. I'm wondering if I'm missing something. Thank you for your time and assistance.
The numbers in the appendix will change. The attack budget was smaller when computing some of those. Go ahead with what the script gives you.
Hi, thanks for the great and meaningful work! When I using Fact-erasure defense, Input Rephrasing attack on gpt2-xl, I got this result.
And this is the script I used:
This is inconsistent with the results in the paper. I'm wondering if I'm missing something. Thank you for your time and assistance.