Questions about the performance

YananGu commented 2 years ago

Hi, I run the code on Cifar10, but the performance mentioned in the paper cannot be achieved. Could you give me some help? My step-2 experiment log is in the link below :

https://drive.google.com/file/d/1WaXOvGYKVD4sqxLWe204_vAeuIYBr4Vd/view?usp=sharing

OatmealLiu commented 2 years ago

Hey Yanan!

Thanks for the question!

We just solved a bug due to the change of NumPy version at incd_ablation_expt.py line #1384. In the code before, we used targets_new = targets at incd_ablation_expt.py line #1384 under the older NumPy version in our server. But, in the conda environment we uploaded to this repo., we adapt a newer NumPy version. Then, the targets_new = targets at incd_ablation_expt.py line #1384 becomes shallow copy instead of deep copy that we mean to use. Therefore, objects targets_new and targets will both be changed. This is the bug that will corrupt your evaluation results. Now, we solved the bug by using targets_new = np.copy(targets) at incd_ablation_expt.py line #1385 to conduct the deep copy under the current NumPy version.

I checked your step-2 training log. I suggest you:

Download the newest updated code from this repo.
Check your step-1 pre-training carefully because the first epoch in your training log is a bit lower than ours.
Conduct the step-2 training and evaluation again by using the newest code
Also, you can download and use our trained model weights for step-1, step-2 and two-step iNCD for CIFAR-10, CIFAR-100 and TinyImagenet from the drive link: https://drive.google.com/file/d/1R6EB2biQj5iBPYZwC7dAzoy-qMJ-Naz6/view?usp=sharing

Note: for your convenience, I also updated the code and README.md file to let you use our (or your) trained model weights to reproduce the experimental results in the paper. Please follow the section Testing the Trained Model in the newly updated README.md file.

Have a nice day!

Best regards, Miu

YananGu commented 2 years ago

Hi Miu! Thank you for your detailed answer. I downloaded your pre-trained model and tested it, and the results were the same as what you showed.

I continued to conduct experiments on CIFAR10. First, I used your pre-trained STEP-1 model for the second step of training. The overall average accuracy was similar to your results, but the performance on the old classes was lower than your results, and the performance on the new classes was higher than your results. I don't know if this is normal, the results are linked below：

https://drive.google.com/file/d/1CPDIqjcqOOBGhzr9lTysCQ0QaAcRabXp/view?usp=sharing

I also tried to retrain the model of the first step, and then train the step 2 model based on the step 1 model that I trained myself. The results were not good, proving that the performance of the model in the first step of my training was not good. But I don't know why it's bad, I train the model based on the parameters in the script.

The log of the first step model I trained myself is as follows:

https://drive.google.com/file/d/1zvnprbljc8Jte-Nvd1IXLSK14YXiGDMt/view?usp=sharing

The second phase of training log based on the step-1 model pre-trained by myself:

https://drive.google.com/file/d/1m9AvBaNMBVI0F6tfAf8lABazhrel2Hxt/view?usp=sharing

Thanks very much for your nice work! and hope to get your help!

Best regards, Yanan

OatmealLiu commented 2 years ago

Heyyy Yanan,

In order to let you know where is the issue, I conducted the experiments on CIFAR-10 from scratch from stage-1 to stage-2, using the code from this repo.

Regarding stage-1 Obviously, the final performance on old (base) classes during your stage-1 supervised pre-training cannot reach ours. The performance we reach is 0.9218 for stage-1 at 200 epoch. You can check our newest training log (2022-08-26) for stage-1 here to compare:

https://drive.google.com/file/d/1x1KkWAzyI6H1UHHG5XfTYBDbtUPmiYXt/view?usp=sharing

So, I think the problem you have for stage-1 is the dev. environment. Therefore, I generated a requirements.txt file here that I used for the newest experiment above. You can download it and create the same environment as we used:

https://drive.google.com/file/d/17vCmL3Xy4InRoei0Vzkg3YlkaOsdAqI8/view?usp=sharing

Regarding stage-2 I suggest you to set the total training epoch to 300 instead of 200. Then I believe you can reproduce the result. The ACC for old (base) classes will be improved, while the ACC for novel classes will decrease a bit. And below is our newest stage-2 training log using the pre-trained model from the stage-1 training on Aug. 26. You can refer this training log to check your future training results:

https://drive.google.com/file/d/1w5VXEC4QXKburlevIC5p8cLRp1WU4Gnb/view?usp=sharing

Thank you for your attention and support for our work. Good luck!

Best regards, Miu

YananGu commented 2 years ago

Hi, Miu,

Thanks for your patient answer, I got the similar results. This work is a great job!

Best regards, Yanan

OatmealLiu commented 2 years ago

Dear Yanan,

My pleasure. Thank you for your interest in our work. Good luck!

Best regards, Miu

OatmealLiu / class-iNCD

Questions about the performance #2