WisonZ / GKD

Code for GKD
8 stars 0 forks source link

Reproduction Failed. #8

Open Leun9 opened 1 year ago

Leun9 commented 1 year ago

We attempted to reproduce the experimental results by directly using the code from that repository but failed.

Firstly, We trained a teacher model using the modified code, and its performance was similar to what was reported in the paper. Then, we trained a student model using train_gkd.py but got bad results. To verify if there were any other issues we removed the distillation loss, and we obtained a good MBF student.

Here is the evaluation results.

LFW CFP_FP AgeDB CALFW CPLFW IJB-B@1e-5 IJB-B@1e-4 IJB-C@1e-5 IJB-C@1e-4
IR50 99.78 98.04 98.00 96.03 92.90 87.27 94.31 92.87 95.70
GKD 99.65 95.49 97.28 95.78 90.57 64.24 86.21 67.73 86.93
MBF 99.57 95.69 97.07 95.88 90.95 76.35 89.75 81.81 91.51
WisonZ commented 1 year ago

could you please provide the training log?

WisonZ commented 1 year ago

Also, the version of packages may matter: torchvision 0.8.2 timm 0.5.4 python 3.8.5
pytorch 1.7.1

Leun9 commented 1 year ago

Here are the logs.

ir50.log mbf.log gkd.log

The packages version are: torch 1.10.2 timm 0.9.7 torchvision 0.11.3

But the version of packages should not cause significant fluctuations in the results. There might be mistakes with other details of our experiment.

WisonZ commented 1 year ago

Trying to replace the random seeds and assigning smaller weights to the GKD, the model seems to converge to a suboptimal solution.

WisonZ commented 1 year ago

Also, try to keep the same package version, which may only be slightly different but can be fatal.

Leun9 commented 1 year ago

Thanks for your suggestions. I will reduce the weight by half, i.e. 1.0 for CE Loss and 0.5 for GKD loss. BTW, I have discarded all deterministic options (including using a seed).

Also, try to keep the same package version, which may only be slightly different but can be fatal.

Leun9 commented 12 months ago

We changed the weight of GKD Loss but failed to obtain similar results.

We tried our best to reproduce it, but there was still a significant difference from the results in the paper.

Could you provide the logs mentioned in the paper or generated by this repo?