Closed howardgriffin closed 3 years ago
Hi - would you be able to train with KL based KD?
I will have a try. By the way, is AdaptiveLossSoft negative normal?
I have replaced the Alpha-Divergence with KL-Divergence based KD, but it doesn't seem to converge. The train accuracy first increases from 0.09 to 0.14 then decrease from 0.14 to 0.06. However, when I don't use KD(use criterion not soft_criterion in code), the training accuracy will get 0.92(on my own dataset). I am so confused. I don't know what I did wrong.
Hi - have you checked if the teacher model was loaded correctly?
Hi, When I use my own dataset and train the Knowledge distillation by the AdaptiveLossSoft, the loss will gradually become NAN and the acc1 will first increase and then decrease (acc5 is set the same as acc1 in my code, just ignore this). Any suggestions?
Epoch: [0][ 0/7176] Time 71.193 (71.193) Data 66.870 (66.870) Loss 2.8784e-01 (2.8784e-01) Acc@1 0.09 ( 0.09) Acc@5 0.09 ( 0.09) Epoch: [0][ 100/7176] Time 1.870 ( 2.728) Data 0.000 ( 0.662) Loss 4.6225e-02 (-7.6008e-02) Acc@1 0.08 ( 0.11) Acc@5 0.08 ( 0.11) Epoch: [0][ 200/7176] Time 1.931 ( 2.363) Data 0.000 ( 0.333) Loss 4.9116e-01 (7.5377e-02) Acc@1 0.09 ( 0.11) Acc@5 0.09 ( 0.11) Epoch: [0][ 300/7176] Time 1.860 ( 2.250) Data 0.000 ( 0.222) Loss 2.0824e-01 (2.0263e-01) Acc@1 0.10 ( 0.11) Acc@5 0.10 ( 0.11) Epoch: [0][ 400/7176] Time 2.271 ( 2.200) Data 0.000 ( 0.167) Loss 8.5119e-01 (3.0746e-01) Acc@1 0.12 ( 0.11) Acc@5 0.12 ( 0.11) Epoch: [0][ 500/7176] Time 2.325 ( 2.173) Data 0.000 ( 0.134) Loss 1.6488e+00 (4.4695e-01) Acc@1 0.10 ( 0.11) Acc@5 0.10 ( 0.11) Epoch: [0][ 600/7176] Time 2.029 ( 2.150) Data 0.000 ( 0.111) Loss 1.2261e+00 (6.1032e-01) Acc@1 0.13 ( 0.11) Acc@5 0.13 ( 0.11) Epoch: [0][ 700/7176] Time 1.945 ( 2.134) Data 0.000 ( 0.096) Loss -3.8604e-01 (6.4971e-01) Acc@1 0.12 ( 0.11) Acc@5 0.12 ( 0.11) Epoch: [0][ 800/7176] Time 1.981 ( 2.121) Data 0.000 ( 0.084) Loss -1.3327e-02 (5.4336e-01) Acc@1 0.15 ( 0.12) Acc@5 0.15 ( 0.12) Epoch: [0][ 900/7176] Time 2.034 ( 2.108) Data 0.000 ( 0.074) Loss 1.0979e+00 (5.1461e-01) Acc@1 0.13 ( 0.12) Acc@5 0.13 ( 0.12) Epoch: [0][1000/7176] Time 1.975 ( 2.101) Data 0.000 ( 0.067) Loss 1.6591e-02 (4.1266e-01) Acc@1 0.18 ( 0.12) Acc@5 0.18 ( 0.12) Epoch: [0][1100/7176] Time 1.715 ( 2.095) Data 0.000 ( 0.061) Loss -1.0021e+00 (3.3724e-01) Acc@1 0.13 ( 0.12) Acc@5 0.13 ( 0.12) Epoch: [0][1200/7176] Time 1.852 ( 2.088) Data 0.000 ( 0.056) Loss -3.7594e-01 (2.9739e-01) Acc@1 0.13 ( 0.12) Acc@5 0.13 ( 0.12) Epoch: [0][1300/7176] Time 1.892 ( 2.082) Data 0.000 ( 0.052) Loss -7.8806e-02 (2.5089e-01) Acc@1 0.12 ( 0.12) Acc@5 0.12 ( 0.12) Epoch: [0][1400/7176] Time 1.956 ( 2.078) Data 0.000 ( 0.048) Loss 8.6050e-02 (2.3144e-01) Acc@1 0.19 ( 0.13) Acc@5 0.19 ( 0.13) Epoch: [0][1500/7176] Time 2.031 ( 2.074) Data 0.000 ( 0.045) Loss -1.8159e-01 (2.2123e-01) Acc@1 0.16 ( 0.13) Acc@5 0.16 ( 0.13) Epoch: [0][1600/7176] Time 2.118 ( 2.072) Data 0.000 ( 0.042) Loss 3.8409e-01 (2.1557e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13) Epoch: [0][1700/7176] Time 2.163 ( 2.069) Data 0.000 ( 0.039) Loss 3.2751e-01 (2.1508e-01) Acc@1 0.15 ( 0.13) Acc@5 0.15 ( 0.13) Epoch: [0][1800/7176] Time 2.166 ( 2.068) Data 0.000 ( 0.037) Loss -3.0104e-01 (2.1683e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13) Epoch: [0][1900/7176] Time 1.822 ( 2.066) Data 0.000 ( 0.035) Loss 3.6041e-01 (2.1936e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13) Epoch: [0][2000/7176] Time 1.888 ( 2.065) Data 0.000 ( 0.034) Loss 6.0852e-02 (2.2056e-01) Acc@1 0.17 ( 0.14) Acc@5 0.17 ( 0.14) Epoch: [0][2100/7176] Time 1.928 ( 2.064) Data 0.000 ( 0.032) Loss 7.0139e-01 (2.2213e-01) Acc@1 0.20 ( 0.14) Acc@5 0.20 ( 0.14) Epoch: [0][2200/7176] Time 2.212 ( 2.061) Data 0.000 ( 0.031) Loss 2.7252e-01 (2.1953e-01) Acc@1 0.21 ( 0.14) Acc@5 0.21 ( 0.14) Epoch: [0][2300/7176] Time 1.816 ( 2.060) Data 0.000 ( 0.029) Loss -1.5090e-01 (2.2140e-01) Acc@1 0.15 ( 0.14) Acc@5 0.15 ( 0.14) Epoch: [0][2400/7176] Time 1.929 ( 2.059) Data 0.000 ( 0.028) Loss 4.2306e-01 (2.1328e-01) Acc@1 0.15 ( 0.15) Acc@5 0.15 ( 0.15) Epoch: [0][2500/7176] Time 1.886 ( 2.057) Data 0.000 ( 0.027) Loss 2.7449e-01 (1.9290e-01) Acc@1 0.17 ( 0.15) Acc@5 0.17 ( 0.15) Epoch: [0][2600/7176] Time 1.813 ( 2.056) Data 0.000 ( 0.026) Loss 5.1589e-02 (2.1373e-01) Acc@1 0.16 ( 0.15) Acc@5 0.16 ( 0.15) Epoch: [0][2700/7176] Time 2.145 ( 2.055) Data 0.000 ( 0.025) Loss -6.0235e-01 (1.9399e-01) Acc@1 0.19 ( 0.15) Acc@5 0.19 ( 0.15) Epoch: [0][2800/7176] Time 1.944 ( 2.054) Data 0.000 ( 0.024) Loss 7.8085e-02 (1.7437e-01) Acc@1 0.15 ( 0.15) Acc@5 0.15 ( 0.15) Epoch: [0][2900/7176] Time 1.778 ( 2.053) Data 0.000 ( 0.023) Loss -1.6850e-03 (1.6211e-01) Acc@1 0.13 ( 0.15) Acc@5 0.13 ( 0.15) Epoch: [0][3000/7176] Time 1.767 ( 2.052) Data 0.000 ( 0.022) Loss nan (nan) Acc@1 0.00 ( 0.15) Acc@5 0.00 ( 0.15) Epoch: [0][3100/7176] Time 2.064 ( 2.050) Data 0.000 ( 0.022) Loss nan (nan) Acc@1 0.00 ( 0.14) Acc@5 0.00 ( 0.14) Epoch: [0][3200/7176] Time 2.222 ( 2.048) Data 0.000 ( 0.021) Loss nan (nan) Acc@1 0.00 ( 0.14) Acc@5 0.00 ( 0.14) Epoch: [0][3300/7176] Time 2.206 ( 2.046) Data 0.000 ( 0.020) Loss nan (nan) Acc@1 0.00 ( 0.13) Acc@5 0.00 ( 0.13) Epoch: [0][3400/7176] Time 1.906 ( 2.044) Data 0.000 ( 0.020) Loss nan (nan) Acc@1 0.00 ( 0.13) Acc@5 0.00 ( 0.13) Epoch: [0][3500/7176] Time 2.058 ( 2.042) Data 0.000 ( 0.019) Loss nan (nan) Acc@1 0.00 ( 0.13) Acc@5 0.00 ( 0.13) Epoch: [0][3600/7176] Time 1.912 ( 2.040) Data 0.000 ( 0.019) Loss nan (nan) Acc@1 0.00 ( 0.12) Acc@5 0.00 ( 0.12) Epoch: [0][3700/7176] Time 2.006 ( 2.038) Data 0.000 ( 0.018) Loss nan (nan) Acc@1 0.00 ( 0.12) Acc@5 0.00 ( 0.12) Epoch: [0][3800/7176] Time 1.990 ( 2.036) Data 0.000 ( 0.018) Loss nan (nan) Acc@1 0.00 ( 0.12) Acc@5 0.00 ( 0.12) Epoch: [0][3900/7176] Time 2.073 ( 2.035) Data 0.000 ( 0.017) Loss nan (nan) Acc@1 0.00 ( 0.11) Acc@5 0.00 ( 0.11) Epoch: [0][4000/7176] Time 2.152 ( 2.033) Data 0.000 ( 0.017) Loss nan (nan) Acc@1 0.00 ( 0.11) Acc@5 0.00 ( 0.11) Epoch: [0][4100/7176] Time 2.183 ( 2.033) Data 0.000 ( 0.016) Loss nan (nan) Acc@1 0.00 ( 0.11) Acc@5 0.00 ( 0.11) Epoch: [0][4200/7176] Time 2.054 ( 2.031) Data 0.000 ( 0.016) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10) Epoch: [0][4300/7176] Time 1.870 ( 2.030) Data 0.000 ( 0.016) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10) Epoch: [0][4400/7176] Time 1.923 ( 2.029) Data 0.000 ( 0.015) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10) Epoch: [0][4500/7176] Time 1.891 ( 2.028) Data 0.000 ( 0.015) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10) Epoch: [0][4600/7176] Time 1.866 ( 2.027) Data 0.000 ( 0.015) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10) Epoch: [0][4700/7176] Time 1.887 ( 2.026) Data 0.000 ( 0.014) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09) Epoch: [0][4800/7176] Time 2.037 ( 2.025) Data 0.000 ( 0.014) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09) Epoch: [0][4900/7176] Time 2.019 ( 2.024) Data 0.000 ( 0.014) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09) Epoch: [0][5000/7176] Time 1.936 ( 2.023) Data 0.000 ( 0.014) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09) Epoch: [0][5100/7176] Time 1.972 ( 2.022) Data 0.000 ( 0.013) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09)