Reproduction Failed - Githubissues

jaehoon00 commented 7 months ago

Hi, and first, thank you for sharing training codes.

I've been trying to reproduce the results in your paper using the code you uploaded but I have failed to reproduce them. Here's what I've done.

I trained a teacher model (IR50) using the modified code (just get rid of the KD part). -> I have succeeded to reproduce the result.
I trained a student model using your code directly (like you said in #6, CosFace (m=0.4), CosLR, Vanilla_GKD). -> I got very bad result (GKD in the table below).
Following your recommendation in #8, I've assigned smaller weight to the GKD loss (1 -> 0.5). -> The results (GKD_weight_0.5 below) got better, but are still poor compared to yours in the paper. -> I had similar results to @Leun9's though.
I used the IJB evaluation code from TFace following your recommendation.

Here's my training logs and results.

gkd.log gkd_weight_0.5.log

I've done several more experiments but I couldn't reproduce the similar results to yours in the paper.

I have the following questions.

Were you able to reproduce the results in the paper using the codes you provided? If so, can you give me some advice so that I can reproduce?
In your code, the teacher and student model are IR50 and MobileFaceNet(scale=2). Did you use these models to reproduce the results?
Can you please provide your training logs?

Leun9 commented 7 months ago

Hello,

We have made numerous attempts to replicate the GKD study, including directly running the provided code and meticulously recreating GKD in our own codebase. It is important to note significant discrepancies between this code repository and the methods described in the paper. Despite these efforts, we have yet to find substantial evidence supporting the data presented in the paper.

We observed that the ablation studies in the paper were conducted on datasets such as LFW, CFP-FP, CPLFW, AgeDB, and CALFW, using an evaluation metric of balanced positive and negative pair verification success rate. In comparison to benchmarks like IJB, MegaFace, and MFR, this metric shows considerable variance and does not effectively reflect objective data patterns.

Additionally, a notable issue is that the authors employed a stronger baseline in the reported data without providing any evaluation data for this baseline. Consequently, it is unclear to what extent GKD can enhance distillation performance.

The purpose of sharing our challenges in the replication process is to help future developers and researchers avoid similar pitfalls, not to criticize GKD. All methods have their limitations. In fact, we are actively reaching out to the authors in hopes of obtaining more detailed training or model information to accurately replicate the performance demonstrated by GKD in the paper.

jaehoon00 commented 6 months ago

Hi, @Leun9. Thank you for sharing your experiences of replicating the GKD.

The GKD showed such an outstanding result in its paper and its main idea seems to be logical. However it's quite difficult to reproduce it and I'm just trying to figure out what's making differences between the reported result and mine.

On your opinion about our main purpose here, I do agree. Thank you.

WisonZ / GKD

Reproduction Failed #10