Potential issue for continual forgetting

ChengZe2005 commented 1 month ago

After executing the run_cl_forget.sh script for GSlora, I noticed that the desired results were not achieved. Upon closer inspection, I have identified a potential issue within the train/train_own_forget_cl.py script. Specifically, I observed that there might be a lack of implementation for the sparsity warm-up phase, which could potentially be contributing to the issue. It would be advisable to review and possibly add the necessary code for this warm-up process to ensure optimal performance. But even after I added the warm-up process, proper results couldn't be achieved either, there might be additional factors at play. I hope that you could take a closer look into these matters with care.

bjzhb666 commented 1 month ago

Thanks for your attention. Because we have conducted many ablation studies, the hyperparameters may not be the ones we use in our Table. Which results do you want to reproduce? I can check the logs and give you some guidance.

ChengZe2005 commented 1 month ago

Thanks for your attention. Because we have conducted many ablation studies, the hyperparameters may not be the ones we use in our Table. Which results do you want to reproduce? I can check the logs and give you some guidance.

Thank you for your prompt response! I want to reproduce the results in Table 3 for GSlora, using the script below:

export CUDA_VISIBLE_DEVICES=2
NUM_FIRST_CLS=80
PER_FORGET_CLS=$((100-$NUM_FIRST_CLS))

 # GS-LoRA
 for lr in 1e-2
 do
 for beta in 0.15 
 do
 python3 -u train/train_own_forget_cl.py -b 48 -w 0 -d casia100 -n VIT -e 100 \
     -head CosFace --outdir out_path/to/exps/CLGSLoRA/start${NUM_FIRST_CLS}forgetper${PER_FORGET_CLS}lr${lr}beta${beta} \
     --warmup-epochs 0 --lr $lr --num_workers 8  --lora_rank 8 --decay-epochs 100 \
     --vit_depth 6 --num_of_first_cls $NUM_FIRST_CLS --per_forget_cls $PER_FORGET_CLS \
     -r results/ViT-P8S8_casia100_cosface_s1-1200-150de-depth6/Backbone_VIT_Epoch_1110_Batch_82100_Time_2023-10-18-18-22_checkpoint.pth \
     --BND 110 --beta $beta --alpha 0.01 --min-lr 1e-5 --num_tasks 4 --wandb_group forget_cl_new \
     --cl_beta_list 0.15 0.15 0.15 0.15 
 done
 done

Without altering any other code, I wasn't able to achieve the results displayed in the table (all 74%). Upon scrutinizing all the code, I discovered that there is no code for sparse warmup in train_own_forget_cl.py, unlike in train_own_forget.py where it is present. Subsequently, I copied the relevant code snippets from train_own_forget.py to train_own_forget_cl.py and appended --warmup_alpha --big_alpha 0.01 to the end of the script. This correction yielded the correct outcome for TASK0, however, TASK1, 2, and 3 continue to produce incorrect results. What could be the issue?

TY-LEE-KR commented 1 month ago

I also have same problem...

bjzhb666 commented 1 month ago

Thanks for your attention.

I checked our experimental log in wandb and found this were the parameters we used to get the results in the paper.

python train_own_forget_cl.py -b 48 -w 0 -d casia100 -n VIT -e 100 -head CosFace --outdir /data1/zhaohongbo/exps/draw-forget-CL/CLGSLoRA/start80forgetper20lr1e-2beta0.15 --warmup-epochs 0 --lr 1e-2 --num_workers 8 --lora_rank 8 --decay-epochs 100 --vit_depth 6 --num_of_first_cls 80 --per_forget_cls 20 -r /data/zhaohongbo/Github/amnesic-face-recognition/Face-Transformer/results/ViT-P8S8_casia100_cosface_s1-1200-150de-depth6-new/Backbone_VIT_Epoch_1110_Batch_82100_Time_2023-10-18-18-22_checkpoint.pth --BND 105 --beta 0.15 --alpha 0.0001 --min-lr 1e-5 --num_tasks 4 --wandb_group forget_cl_new --cl_beta_list 0.2 0.25 0.25 0.25

Here is our log. output.log

ChengZe2005 commented 1 month ago

Thanks for your attention.

I checked our experimental log in wandb and found this were the parameters we used to get the results in the paper.

python train_own_forget_cl.py -b 48 -w 0 -d casia100 -n VIT -e 100 -head CosFace --outdir /data1/zhaohongbo/exps/draw-forget-CL/CLGSLoRA/start80forgetper20lr1e-2beta0.15 --warmup-epochs 0 --lr 1e-2 --num_workers 8 --lora_rank 8 --decay-epochs 100 --vit_depth 6 --num_of_first_cls 80 --per_forget_cls 20 -r /data/zhaohongbo/Github/amnesic-face-recognition/Face-Transformer/results/ViT-P8S8_casia100_cosface_s1-1200-150de-depth6-new/Backbone_VIT_Epoch_1110_Batch_82100_Time_2023-10-18-18-22_checkpoint.pth --BND 105 --beta 0.15 --alpha 0.0001 --min-lr 1e-5 --num_tasks 4 --wandb_group forget_cl_new --cl_beta_list 0.2 0.25 0.25 0.25

Here is our log. output.log

Thank you for your continued responses! I tried the bash script you provided, but it seems that the results for all the metrics (accf, accr, acco) are consistently at 74%. Maybe there's something wrong with the code on GitHub. Would you mind testing the code there to see if it works properly?

bjzhb666 commented 1 month ago

By the way, have you finished the training process? It is normal all the metrics (accf, accr, acco) are consistently at 74% at first. I will check the code later.

ChengZe2005 commented 1 month ago

I haven't finished yet, but currently, I'm at:

Task 0 Epoch 91 Batch 5450: 
- Training forget Loss: 21.0000 (21.0000)
- Training remain Loss: 0.0000 (0.0000)    
- Training structure Loss: 0.0000 (0.0000)   
- Training total Loss: 21.0000 (21.0000)   
- Training forget Prec@1: 100.000 (100.000)        
- Training remain Prec@1: 100.000 (100.000)

Current learning rate: 0.0002545

Performing evaluation on the test set and saving checkpoints...

Test forget-0 Accuracy: 74.652956%
Test remain-0 Accuracy: 74.499165%

This output seems to be abnormal.

bjzhb666 commented 1 month ago

I have trained it just now and I find that I can get the decreasing trend of forget acc. i.e., the model can escape the local minima. I guess the reason comes from different machines. I am using RTX3090, different machines or cuda version or other reasons (maybe python version?) can get different results even using the same seed. e.g. The random number is different when using different machines.

I recommend you to increase $\beta$ or add warmup_alpha --big_alpha 0.01 strategies to get a reasonable result.

Actually, we conducted three times and got the same results when we wrote the results on our paper.

ChengZe2005 commented 1 month ago

I'm sorry but I followed your instruction but got this result:

Where did I go wrong? Below is the script I use:

python train/train_own_forget_cl.py -b 48 -w 0 -d casia100 -n VIT -e 100 -head CosFace --outdir /data1/zhaohongbo/exps/draw-forget-CL/CLGSLoRA/start80forgetper20lr1e-2beta0.15 --warmup-epochs 0 --lr 1e-2 --num_workers 8 --lora_rank 8 --decay-epochs 100 --vit_depth 6 --num_of_first_cls 80 --per_forget_cls 20 -r /data/zhaohongbo/Github/amnesic-face-recognition/Face-Transformer/results/ViT-P8S8_casia100_cosface_s1-1200-150de-depth6-new/Backbone_VIT_Epoch_1110_Batch_82100_Time_2023-10-18-18-22_checkpoint.pth --BND 105 --beta 0.15 --alpha 0.0001 --min-lr 1e-5 --num_tasks 4 --wandb_group forget_cl_new --cl_beta_list 0.2 0.25 0.25 0.25 --warmup_alpha --big_alpha 0.01

ChengZe2005 commented 1 month ago

In fact I'm wondering that the code in the Github may be different from the one you use.

bjzhb666 commented 1 month ago

I'm sorry but I followed your instruction but got this result:

Where did I go wrong? Below is the script I use:

python train/train_own_forget_cl.py -b 48 -w 0 -d casia100 -n VIT -e 100 -head CosFace --outdir /data1/zhaohongbo/exps/draw-forget-CL/CLGSLoRA/start80forgetper20lr1e-2beta0.15 --warmup-epochs 0 --lr 1e-2 --num_workers 8 --lora_rank 8 --decay-epochs 100 --vit_depth 6 --num_of_first_cls 80 --per_forget_cls 20 -r /data/zhaohongbo/Github/amnesic-face-recognition/Face-Transformer/results/ViT-P8S8_casia100_cosface_s1-1200-150de-depth6-new/Backbone_VIT_Epoch_1110_Batch_82100_Time_2023-10-18-18-22_checkpoint.pth --BND 105 --beta 0.15 --alpha 0.0001 --min-lr 1e-5 --num_tasks 4 --wandb_group forget_cl_new --cl_beta_list 0.2 0.25 0.25 0.25 --warmup_alpha --big_alpha 0.01

I mean use these strategies. (increase $\beta$ or add --warmup_alpha --big_alpha 0.01). Maybe you need to implement some code yourself as you have already done.

bjzhb666 commented 1 month ago

I use the code from Github and can get a similar result like yesterday. If you can not reimplement the results yet, I recommend you to use the two strategies or increasing lr slightly.

ChengZe2005 commented 1 month ago

Thanks for your continuous reply, I implemented some code myself yesterday and I have got correct result!

bjzhb666 commented 1 month ago

Based on your question, we plan to refind some hyperparameters to get a better selection and help others reimplement the results more easily.

bjzhb666 / GS-LoRA

Potential issue for continual forgetting #5