Vaidehi99 / InfoDeletionAttacks

MIT License
37 stars 2 forks source link

Some problems about reproducing the results #4

Closed PinzhengWang322 closed 7 months ago

PinzhengWang322 commented 7 months ago

Hi, thanks for the great and meaningful work! When I using Fact-erasure defense, Input Rephrasing attack on gpt2-xl, I got this result.

saving csv at ..//results/gpt2-xl_ROME_outputs_cf_filt_editing_sweep_ws-[1]_layer-17_fact-erasure_no_margin_no_entropy_n700_t700_secondhalfFalse_analytical_solnFalse_top-4_grad_attk_lyr_mpFalse_parap_attack_4_mg_cfdefFalse_samp5.csv...
results shape:  (700, 55)

final metrics: 
 actual_retain_rate: 0.898
 actual_retain_rate_n: 0.746
 delta_accuracy: 0.012
 delta_accuracy_neigh: 0.032
 retain_rate_neighbor: 0.488
 retain_rate_pre: 1.000
 retain_rate_neighbor: 0.456
 retain_rate: 0.988
 attack_frac: 0.047
 tgt_in_sample: 0.349
 post_rewrite_success: 1.000
 rewrite_prob_diff: -0.285
 rewrite_post_prob: 0.008
 rewrite_score: 0.930
 post_paraphrase_succ: 0.999
 paraphrase_prob_diff: -0.128
 paraphrase_pre_prob: 0.153
 paraphrase_post_prob: 0.024
 paraphrase_score: 0.755
 post_neighborhood_su: 0.000
 neighborhood_prob_di: -0.017
 neighborhood_score: 0.973
 essence_ppl_diff: 3.271

And this is the script I used:

python3 -m experiments.evaluate_parap_attack \
        --alg_name ROME \
        --ds_name cf_filt \
        --model_name gpt2-xl \
        --run 1 \
        --correctness_filter 1 \
        --norm_constraint 1e-4 \
        --kl_factor .0625 \
        --gpu 2 \
        --edit_layer 17 \
        -n 700 \
        --datapoints_to_execute 700 \
        --num_attack_parap 4 \
        --retain_rate \
        --attack mg \
        --bb_num_samples 5 \
        --fact_erasure 

This is inconsistent with the results in the paper. I'm wondering if I'm missing something. Thank you for your time and assistance.

截屏2024-02-02 18 32 17
Vaidehi99 commented 7 months ago

The numbers in the appendix will change. The attack budget was smaller when computing some of those. Go ahead with what the script gives you.