Closed Davidjinzb closed 4 months ago
Thank you for your queries. Here are my responses to each of your points:
Regarding the parameter settings for the GA, I acknowledge there seems to be a typo or error in the paper. The learning rate of 1e-4 is only the default for class-wise unlearning in the context of CIFAR10 and ResNet18. We plan to correct and clarify this in the appendix. It's important to note that GA is quite sensitive to the learning rate, particularly when it comes to unlearning random data. We employed a grid search in the range of [0.1, 1e-5] to identify the most appropriate learning rate that minimizes the gap between retraining and GA. In light of your results, as seen in Figure 3 and Figure 4 where the Unlearning Accuracy (UA) is relatively low, it suggests that the learning rate might be too small for effective unlearning. I recommend increasing the learning rate. Additionally, for Fisher unlearning, you might consider lowering the alpha, possibly to around 1e-9. Because the unlearned model's model TA is only 5.03%, which means alpha is too large.
For MIA efficacy, we opted to use the confidence value as the metric.
The primary factor contributing to discrepancies in replicating results seems to be the hyperparameters. The unlearning methods, including GA and Fisher, are highly sensitive to these parameters, and this sensitivity can vary across different random seeds and initial models. This hyperparameter sensitivity is precisely why we utilized a grid search strategy to identify a reasonably unlearned model.
I hope these responses clarify your concerns and aid in your further experiments.
Thank you very much for your prompt and timely response, which has been of great help to us. I am also looking forward to the content of your appendix. Could you possibly give us an idea of when the appendix might be ready? We are preparing a review paper and, for its rigor, we wish to replicate some of the studies. If possible, could you provide us in advance with the part of your work regarding the hyperparameter settings?
Thank you in advance for your consideration.
Thank you for your attention to the details of our paper. We will update it ASAP.
Due to the high sensitivity of GA, FF, and IU methods to hyperparameters, which varies with different random seeds and initial models, we tailor the hyperparameters accordingly for each scenario:
FT: A learning rate of around 0.04 is used for random forgetting.
GA: Conducted grid searches between [0.01, 1e-3] for various seeds and initial models, a range determined to be effective for GA.
FF: Performed grid searches between [1e-9, 1e-8] for random data forgetting, based on our findings.
IU: Utilized a grid search range of [0.1, 5], as indicated by our experimental results.
I hope these responses clarify your concerns and aid in your further experiments.
Thank you~
Hi Jinghan,
Thank you for providing the range of parameters; it's been immensely helpful for our work. However, we've tested all the results for FF from 1e-9 to 1e-8, and they still show a deviation from the results in your paper. Below are the screenshots of our replication attempts. The numbers following "omp" in these figures represent the learning rates.
Figure 1: Results of [1e-9, 2e-9, 3e-9]
Figure 2: Results of [4e-9, 5e-9, 6e-9]
Figure 3: Results of [7e-9, 8e-9, 9e-9]
We would greatly appreciate any additional insights or suggestions you might have to explain these discrepancies.
Thank you!
Thank you for sharing your results. Apologies for any confusion regarding the hyperparameters. The range [1e-9, 1e-8] should actually be applied to alpha, not the learning rate. In the Fisher method, we don't use the learning rate; alpha is the sole hyperparameter that requires tuning. For example, you can adjust your Fisher scripts by appending '--alpha 1e-9' to them.
If you encounter any further issues, please don't hesitate to reach out.
Hey Jinghan
Thanks for your reply, and I'm very sorry for the misunderstanding; it seems there was a miscommunication in my last reply. Indeed, we have adjusted the alpha parameter as per your instructions. Use the following command:
python -u main_forget.py --save_dir ${save_dir} --mask ${mask_path} --unlearn fisher_new --num_indexes_to_replace 4500 --alpha ${alpha}
Thank you for the clarification. Could you provide the Fisher results for the dense model as well? By the way, for random forgetting, we use 'class_to_replace = -1'. Can you replicate the results using the following script from the original codebase:
python -u main_forget.py --save_dir test --mask $mask_path --unlearn fisher_new --num_indexes_to_replace 4500 --class_to_replace -1 --seed 1 --alpha 1e-9
Issue closed. Please don't hesitate to raise another issue if you have any more questions, thanks.
Hello,
I've been working on replicating some results from your paper using the provided commands and code modifications in the README. However, I am encountering some discrepancies in the results, particularly with the MIA-Efficacy values. Below, I detail the steps taken and the issues encountered.
Steps and Code Used:
Initial pruning of the model was done using the command:
I modified
arg_parser.py
with the following additions:When
class_to_replace
is set to None, a random selection of indexes equal to the number specified bynum_indexes_to_replace
will be chosen for the unlearning process.As shown in Figure 1, the results obtained under the 95%-sparse model are calculated as follows: UA=6.78, RA=99.99, TA=92.77. Figure 1
I would like to inquire which value in
SVC_MIA_forget_efficacy
represents MIA-Efficacy. Is it theconfidence
value that closely matches the one mentioned in the paper, or is it the average of these values?I used the following command to get the unlearning result of the Dense model.
As shown in Figure 2, the results under the Dense model are calculated as follows: UA=4.9, RA=99.52, TA=94.62. Figure 2
The above results are relatively close to those reported in the paper; however, I conducted separate tests on GA for both the 95%-sparse model and the Dense Model by the following commands: sparse model command:
Dense Model command:
As shown in Figure 3, the results for the 95%-sparse model are calculated as follows: UA=0.62, RA=99.39, TA=94.23. The UA value differs significantly from the 5.62±0.46 reported in the paper. Additionally, the MIA-Efficacy, whether it's the average or a specific value, shows a considerable discrepancy from the reported 11.76±0.52. Figure 3 95%-sparse model result
As illustrated in Figure 4, the results for the Dense Model are calculated as follows: UA=0.78, RA=99.52, TA=94.52. The UA value shows a significant difference from the 7.54±0.29 mentioned in the paper. Moreover, the average MIA-Efficacy is 8.5, which slightly deviates from the 10.04±0.31 reported in the paper. Figure 4 Dense Model result
Figure 5 As shown in Figure 5, the results exhibit some discrepancies compared to the results shown in Figure 6 from the paper.
Figure 6
Questions:
Thank you for your time and help.
Best regards, David