Closed HaoRanLyu closed 1 year ago
Here is some of my training log:
{"train_lr": 1.7496272431971886e-05, "train_min_lr": 4.156635634982393e-07, "train_loss": 2.5413925336798826, "train_loss_scale": 96192.74973375932, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.193638559918815, "val_loss": 0.7811053160551018, "val_acc1": 82.81313397956617, "val_acc5": 95.51515336932557, "epoch": 0, "n_parameters": 86534800}
{"train_lr": 5.2498136215985944e-05, "train_min_lr": 1.24721208253919e-06, "train_loss": 2.527162428489064, "train_loss_scale": 359261.51224707137, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.2990597688986725, "val_loss": 0.7833489567032399, "val_acc1": 82.5707097117106, "val_acc5": 95.46969886086204, "epoch": 1, "n_parameters": 86534800}
{"train_lr": 8.749999999999999e-05, "train_min_lr": 2.0787606015801426e-06, "train_loss": 2.5302356374058257, "train_loss_scale": 1334868.5154419595, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.237491000765048, "val_loss": 0.7864182489082793, "val_acc1": 82.46464910449404, "val_acc5": 95.36868873133804, "epoch": 2, "n_parameters": 86534800}
{"train_lr": 0.00012250186378401408, "train_min_lr": 2.9103091206210958e-06, "train_loss": 2.5308258418108838, "train_loss_scale": 4930763.927582535, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.153906084668522, "val_loss": 0.7911817570698362, "val_acc1": 82.28788137840502, "val_acc5": 95.27272923787434, "epoch": 3, "n_parameters": 86534800}
{"train_lr": 0.00015750372756802812, "train_min_lr": 3.7418576396620468e-06, "train_loss": 2.5365308628113064, "train_loss_scale": 18088215.173588924, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.029241597182708, "val_loss": 0.805933638277873, "val_acc1": 81.98990167328806, "val_acc5": 95.28283023949825, "epoch": 4, "n_parameters": 86534800}
{"train_lr": 0.00017498011641374657, "train_min_lr": 4.157048823552037e-06, "train_loss": 2.5435110739193245, "train_loss_scale": 65813498.54739084, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 4.92803221966385, "val_loss": 0.7986793881327202, "val_acc1": 81.93939657384699, "val_acc5": 95.13131507873535, "epoch": 5, "n_parameters": 86534800}
{"train_lr": 0.00017486081571940597, "train_min_lr": 4.154214565459046e-06, "train_loss": 2.5354276380092062, "train_loss_scale": 237096545.60170394, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 4.958836034083138, "val_loss": 0.808870542164586, "val_acc1": 82.10606330755985, "val_acc5": 95.05050698482628, "epoch": 6, "n_parameters": 86534800}
{"train_lr": 0.00017462236139404518, "train_min_lr": 4.148549543095186e-06, "train_loss": 2.524969994367216, "train_loss_scale": 843756388.0553781, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 4.999303885351735, "val_loss": 0.8070539586237016, "val_acc1": 81.90909361405807, "val_acc5": 95.07070905512029, "epoch": 7, "n_parameters": 86534800}
{"train_lr": 0.00017426507913758665, "train_min_lr": 4.1400614941995566e-06, "train_loss": 2.523062697668283, "train_loss_scale": 3179487903.7614484, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 4.943076262458826, "val_loss": 0.8192052752418435, "val_acc1": 81.90909354701186, "val_acc5": 95.10101207386364, "epoch": 8, "n_parameters": 86534800}
{"train_lr": 0.00017378945695462042, "train_min_lr": 4.128762012425975e-06, "train_loss": 2.5120036209496064, "train_loss_scale": 11936944516.225773, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 4.957734875595227, "val_loss": 0.8337954825132293, "val_acc1": 81.48485103029194, "val_acc5": 95.01010297139486, "epoch": 9, "n_parameters": 86534800}
{"train_lr": 0.0001731961444878353, "train_min_lr": 4.1146665315073605e-06, "train_loss": 2.5052072117214226, "train_loss_scale": 44399624645.65708, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 4.932257898826315, "val_loss": 0.8161246777075716, "val_acc1": 81.80808339205655, "val_acc5": 94.83838587905421, "epoch": 10, "n_parameters": 86534800}
{"train_lr": 0.00017248595213069498, "train_min_lr": 4.097794304175157e-06, "train_loss": 2.4994882696186203, "train_loss_scale": 164205884905.64432, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 4.9147809340169255, "val_loss": 0.8378828509310305, "val_acc1": 81.65656818736683, "val_acc5": 94.86868887930206, "epoch": 11, "n_parameters": 86534800}
{"train_lr": 0.00017165984992053112, "train_min_lr": 4.078168375862423e-06, "train_loss": 2.491099612448162, "train_loss_scale": 603253084914.6411, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 4.960388170581021, "val_loss": 0.8293322562397534, "val_acc1": 81.67677033395478, "val_acc5": 94.82828481731993, "epoch": 12, "n_parameters": 86534800}
{"train_lr": 0.00017071896621359648, "train_min_lr": 4.0558155532264875e-06, "train_loss": 2.4852597661022813, "train_loss_scale": 2198730520826.82, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 4.947375250637087, "val_loss": 0.8434258197357761, "val_acc1": 81.36868938330448, "val_acc5": 94.73737578652121, "epoch": 13, "n_parameters": 86534800}
{"train_lr": 0.0001696645861438593, "train_min_lr": 4.03076636753434e-06, "train_loss": 2.4735982704141968, "train_loss_scale": 7937794807980.303, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.009435756645772, "val_loss": 0.8341946523087684, "val_acc1": 81.62626522988984, "val_acc5": 94.651517236883, "epoch": 14, "n_parameters": 86534800}
{"train_lr": 0.00016849814986766683, "train_min_lr": 4.003055032960588e-06, "train_loss": 2.4770582596770425, "train_loss_scale": 28322670130613.3, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.0126481548158, "val_loss": 0.8370377499664176, "val_acc1": 81.42929555835146, "val_acc5": 94.69697179389722, "epoch": 15, "n_parameters": 86534800}
{"train_lr": 0.00016722125059666612, "train_min_lr": 3.972719399854952e-06, "train_loss": 2.467465272897317, "train_loss_scale": 105084740706208.03, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 4.996976759614021, "val_loss": 0.8414169982759561, "val_acc1": 81.50000263214112, "val_acc5": 94.70202225425027, "epoch": 16, "n_parameters": 86534800}
{"train_lr": 0.00016583563242166842, "train_min_lr": 3.939800903043325e-06, "train_loss": 2.4564375836021317, "train_loss_scale": 396545484362203.44, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.031623249744582, "val_loss": 0.846190107419439, "val_acc1": 81.45454811211788, "val_acc5": 94.6414162236994, "epoch": 17, "n_parameters": 86534800}
{"train_lr": 0.0001643431879304312, "train_min_lr": 3.904344505232701e-06, "train_loss": 2.455339394977933, "train_loss_scale": 1476469646206960.8, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.0294415041153675, "val_loss": 0.8517721098699216, "val_acc1": 81.36363898884166, "val_acc5": 94.44949706106475, "epoch": 18, "n_parameters": 86534800}
{"train_lr": 0.00016274595562260769, "train_min_lr": 3.866398635597567e-06, "train_loss": 2.444450080428467, "train_loss_scale": 5467029419860430.0, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.109756442939385, "val_loss": 0.8655732078522732, "val_acc1": 81.17677023569743, "val_acc5": 94.37373947374749, "epoch": 19, "n_parameters": 86534800}
{"train_lr": 0.00016104611712540668, "train_min_lr": 3.8260151236315e-06, "train_loss": 2.442638687148722, "train_loss_scale": 2.011272101957207e+16, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.055103810006183, "val_loss": 0.8752835807008084, "val_acc1": 80.91919451395671, "val_acc5": 94.20707285100764, "epoch": 20, "n_parameters": 86534800}
{"train_lr": 0.00015924599421374846, "train_min_lr": 3.7832491283542694e-06, "train_loss": 2.4354949475437624, "train_loss_scale": 7.342929743880968e+16, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.054676846256906, "val_loss": 0.87335663093119, "val_acc1": 80.98485114473286, "val_acc5": 94.42929503469756, "epoch": 21, "n_parameters": 86534800}
{"train_lr": 0.00015734804563900017, "train_min_lr": 3.7381590629712986e-06, "train_loss": 2.43156581025281, "train_loss_scale": 2.6563084319732432e+17, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.05734470794503, "val_loss": 0.8714795445591428, "val_acc1": 81.02020463885682, "val_acc5": 94.3737394587199, "epoch": 22, "n_parameters": 86534800}
{"train_lr": 0.00015535486377061871, "train_min_lr": 3.690806515088127e-06, "train_loss": 2.4255231061198526, "train_loss_scale": 9.501779865576397e+17, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.1192303238966215, "val_loss": 0.8576946401842622, "val_acc1": 81.2474773383863, "val_acc5": 94.36868896484376, "epoch": 23, "n_parameters": 86534800}
{"train_lr": 0.00015326917105528375, "train_min_lr": 3.64125616258938e-06, "train_loss": 2.41831085453042, "train_loss_scale": 3.4728844257512955e+18, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.1131927570342, "val_loss": 0.8737354792141612, "val_acc1": 81.17677027846828, "val_acc5": 94.23737586281516, "epoch": 24, "n_parameters": 86534800}
{"train_lr": 0.00015109381629836518, "train_min_lr": 3.5895756852963304e-06, "train_loss": 2.4169662177892, "train_loss_scale": 1.3170808285322308e+19, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.15444566387211, "val_loss": 0.8888657587953812, "val_acc1": 80.67677019986239, "val_acc5": 94.15656782670455, "epoch": 25, "n_parameters": 86534800}
{"train_lr": 0.00014883177077279562, "train_min_lr": 3.5358356725245128e-06, "train_loss": 2.4039604506565455, "train_loss_scale": 4.908818078187619e+19, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.195993465864366, "val_loss": 0.8938927693238509, "val_acc1": 81.02020463885682, "val_acc5": 94.08081021857984, "epoch": 26, "n_parameters": 86534800}
{"train_lr": 0.00014648612416067158, "train_min_lr": 3.480109526667217e-06, "train_loss": 2.4003955779738004, "train_loss_scale": 1.8197251368985258e+20, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.188081701469624, "val_loss": 0.8852770750181399, "val_acc1": 80.92929550286496, "val_acc5": 93.95959814823036, "epoch": 27, "n_parameters": 86534800}
{"train_lr": 0.0001440600803331153, "train_min_lr": 3.4224733629365753e-06, "train_loss": 2.394745927143354, "train_loss_scale": 6.703692170088017e+20, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.172674979264744, "val_loss": 0.9077644252762702, "val_acc1": 80.46464907559482, "val_acc5": 93.8333355400779, "epoch": 28, "n_parameters": 86534800}
{"train_lr": 0.0001415569529741767, "train_min_lr": 3.363005905399451e-06, "train_loss": 2.384710983269083, "train_loss_scale": 2.4513935170327717e+21, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.201141524683045, "val_loss": 0.891060632891989, "val_acc1": 80.75252781839082, "val_acc5": 93.9444466007117, "epoch": 29, "n_parameters": 86534800}
{"train_lr": 0.0001389801610547368, "train_min_lr": 3.3017883794497277e-06, "train_loss": 2.385687115581367, "train_loss_scale": 8.885240664121348e+21, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.258956227630091, "val_loss": 0.8966912629379742, "val_acc1": 80.59091170339873, "val_acc5": 94.06565874619918, "epoch": 30, "n_parameters": 86534800}
{"train_lr": 0.00013633322416260613, "train_min_lr": 3.2389044008642157e-06, "train_loss": 2.37487191219712, "train_loss_scale": 3.1859629040446434e+22, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.304057523021048, "val_loss": 0.9020782967822419, "val_acc1": 80.46464908021869, "val_acc5": 93.9141436362989, "epoch": 31, "n_parameters": 86534800}
{"train_lr": 0.00013361975769518816, "train_min_lr": 3.1744398615936173e-06, "train_loss": 2.3757882558971772, "train_loss_scale": 1.1476507256558606e+23, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.279250928515189, "val_loss": 0.9114862455792977, "val_acc1": 80.32323493610728, "val_acc5": 93.82323453498609, "epoch": 32, "n_parameters": 86534800}
{"train_lr": 0.0001308434679212863, "train_min_lr": 3.108482812444394e-06, "train_loss": 2.366839743343294, "train_loss_scale": 4.3737462010884695e+23, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.340429253281115, "val_loss": 0.9114761136175282, "val_acc1": 80.43434595975009, "val_acc5": 93.85353753754588, "epoch": 33, "n_parameters": 86534800}
{"train_lr": 0.00012800814691878295, "train_min_lr": 3.041123342811975e-06, "train_loss": 2.361015862456986, "train_loss_scale": 1.6316958047221414e+24, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.378049115847729, "val_loss": 0.9110016783807373, "val_acc1": 80.26262886162961, "val_acc5": 93.65656786369554, "epoch": 34, "n_parameters": 86534800}
{"train_lr": 0.00012511766739511682, "train_min_lr": 2.9724534576294585e-06, "train_loss": 2.356807765167076, "train_loss_scale": 6.055572516035578e+24, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.367852822977641, "val_loss": 0.9145543371700651, "val_acc1": 80.36363889867609, "val_acc5": 93.7424264387651, "epoch": 35, "n_parameters": 86534800}
{"train_lr": 0.00012217597739763454, "train_min_lr": 2.902566951699973e-06, "train_loss": 2.3488302826048275, "train_loss_scale": 2.233744725273037e+25, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.420398276834823, "val_loss": 0.9199220857308539, "val_acc1": 80.54040663748077, "val_acc5": 93.82828499533913, "epoch": 36, "n_parameters": 86534800}
{"train_lr": 0.0001191870949210269, "train_min_lr": 2.831559281584263e-06, "train_loss": 2.3391715803672586, "train_loss_scale": 8.181041776527369e+25, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.447506755876084, "val_loss": 0.9325663230521379, "val_acc1": 80.28283075737231, "val_acc5": 93.65151733051647, "epoch": 37, "n_parameters": 86534800}
{"train_lr": 0.0001161551024192367, "train_min_lr": 2.759527435218465e-06, "train_loss": 2.335013076589622, "train_loss_scale": 2.9708418607850363e+26, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.413609537326744, "val_loss": 0.9330307109740333, "val_acc1": 80.3333359111439, "val_acc5": 93.65151736172763, "epoch": 38, "n_parameters": 86534800}
{"train_lr": 0.00011308414122931563, "train_min_lr": 2.686569799440298e-06, "train_loss": 2.3288872896240385, "train_loss_scale": 1.06770680438365e+27, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.486501457401738, "val_loss": 0.9365977344017674, "val_acc1": 80.15151773279364, "val_acc5": 93.57070934180058, "epoch": 39, "n_parameters": 86534800}
{"train_lr": 0.00010997840591485427, "train_min_lr": 2.6127860256044175e-06, "train_loss": 2.324737059085355, "train_loss_scale": 3.792262537810859e+27, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.496664955148809, "val_loss": 0.9488452352439436, "val_acc1": 80.21212387084961, "val_acc5": 93.51515372074012, "epoch": 40, "n_parameters": 86534800}
{"train_lr": 0.0001068421385367159, "train_min_lr": 2.53827689347078e-06, "train_loss": 2.3222027639079843, "train_loss_scale": 1.4521735391617107e+28, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.533382576938889, "val_loss": 0.947250349316656, "val_acc1": 80.0101035678748, "val_acc5": 93.4545477179325, "epoch": 41, "n_parameters": 86534800}
{"train_lr": 0.00010367962285889037, "train_min_lr": 2.4631441735514276e-06, "train_loss": 2.3113133480952888, "train_loss_scale": 5.422678348869677e+28, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.536935528881126, "val_loss": 0.9530479008385885, "val_acc1": 79.95454801964037, "val_acc5": 93.3030325941606, "epoch": 42, "n_parameters": 86534800}
{"train_lr": 0.00010049517849738676, "train_min_lr": 2.3874904881043534e-06, "train_loss": 2.3084373177684787, "train_loss_scale": 2.014665016437004e+29, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.552935206026823, "val_loss": 0.9539767923058519, "val_acc1": 80.03030560233377, "val_acc5": 93.33838608366071, "epoch": 43, "n_parameters": 86534800}
{"train_lr": 9.729315502015827e-05, "train_min_lr": 2.311419170963825e-06, "train_loss": 2.3012442102098363, "train_loss_scale": 7.441034773304549e+29, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.566783057932402, "val_loss": 0.971230893485824, "val_acc1": 79.91414392066724, "val_acc5": 93.17676996057683, "epoch": 44, "n_parameters": 86534800}
{"train_lr": 9.407792600611681e-05, "train_min_lr": 2.2350341263987176e-06, "train_loss": 2.2938226147994194, "train_loss_scale": 2.729363792344433e+30, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.556014176755667, "val_loss": 0.97488013441238, "val_acc1": 79.86363888133656, "val_acc5": 93.21212352983879, "epoch": 45, "n_parameters": 86534800}
{"train_lr": 9.08538830713473e-05, "train_min_lr": 2.158439687191841e-06, "train_loss": 2.292080524566963, "train_loss_scale": 9.929254701468187e+30, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.599760201601937, "val_loss": 0.9622197006221699, "val_acc1": 79.97980053063596, "val_acc5": 93.28788103508226, "epoch": 46, "n_parameters": 86534800}
{"train_lr": 8.762542987069219e-05, "train_min_lr": 2.081740472133932e-06, "train_loss": 2.288570363796589, "train_loss_scale": 3.5764216934234563e+31, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.661359726113125, "val_loss": 0.963753333054303, "val_acc1": 79.87373986215303, "val_acc5": 93.30303255948154, "epoch": 47, "n_parameters": 86534800}
{"train_lr": 8.439697608288564e-05, "train_min_lr": 2.005041243126912e-06, "train_loss": 2.2730071647221637, "train_loss_scale": 1.2724566025038551e+32, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.657375398408363, "val_loss": 0.9647111907997281, "val_acc1": 79.95454799420907, "val_acc5": 93.15151744726933, "epoch": 48, "n_parameters": 86534800}
{"train_lr": 8.117293138746367e-05, "train_min_lr": 1.928446762091739e-06, "train_loss": 2.275180721925264, "train_loss_scale": 4.801250273375605e+32, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.743854058436312, "val_loss": 0.9707247480111205, "val_acc1": 79.93939649061723, "val_acc5": 93.21212345123291, "epoch": 49, "n_parameters": 86534800}
{"train_lr": 7.795769944167225e-05, "train_min_lr": 1.85206164787628e-06, "train_loss": 2.26334770298558, "train_loss_scale": 1.8017864531397312e+33, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.680276030359177, "val_loss": 0.9778541995365406, "val_acc1": 79.82828542535955, "val_acc5": 93.1212144239021, "epoch": 50, "n_parameters": 86534800}
{"train_lr": 7.475567186560234e-05, "train_min_lr": 1.7759902333584877e-06, "train_loss": 2.2553635221549944, "train_loss_scale": 6.701187172989238e+33, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.7468062906473465, "val_loss": 0.9717100363872622, "val_acc1": 80.03030559770988, "val_acc5": 93.08081028447006, "epoch": 51, "n_parameters": 86534800}
{"train_lr": 7.157122224376345e-05, "train_min_lr": 1.7003364229402502e-06, "train_loss": 2.252801416642147, "train_loss_scale": 2.47809141336782e+34, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.7275057822878495, "val_loss": 0.9903939691898616, "val_acc1": 79.68182070298629, "val_acc5": 93.00000228419448, "epoch": 52, "n_parameters": 86534800}
{"train_lr": 6.840870015129405e-05, "train_min_lr": 1.6252035506265133e-06, "train_loss": 2.249707857898662, "train_loss_scale": 9.10283183015978e+34, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.7638082651538465, "val_loss": 0.973586861943997, "val_acc1": 79.97475002520012, "val_acc5": 93.04040630918561, "epoch": 53, "n_parameters": 86534800}
{"train_lr": 6.527242521296507e-05, "train_min_lr": 1.550694238883414e-06, "train_loss": 2.2435365244488334, "train_loss_scale": 3.3173192027393116e+35, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.775372911121645, "val_loss": 0.9824847435884646, "val_acc1": 80.11616423982562, "val_acc5": 92.96969922730418, "epoch": 54, "n_parameters": 86534800}
{"train_lr": 6.216668120309293e-05, "train_min_lr": 1.4769102584683704e-06, "train_loss": 2.235496751263865, "train_loss_scale": 1.1974022693658846e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.782773152517435, "val_loss": 0.9946640035776413, "val_acc1": 79.82828542304762, "val_acc5": 92.79798210375236, "epoch": 55, "n_parameters": 86534800}
{"train_lr": 5.909571019441642e-05, "train_min_lr": 1.4039523894234381e-06, "train_loss": 2.233559696475903, "train_loss_scale": 4.2715074305441784e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.815433789429446, "val_loss": 0.9911846395015068, "val_acc1": 79.84848737312086, "val_acc5": 92.95959822683623, "epoch": 56, "n_parameters": 86534800}
{"train_lr": 5.60637067639388e-05, "train_min_lr": 1.3319202834219288e-06, "train_loss": 2.22750227000362, "train_loss_scale": 6.280920785194538e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 0.9873847829060998, "val_acc1": 80.06565916234797, "val_acc5": 92.98485075517134, "epoch": 57, "n_parameters": 86534800}
{"train_lr": 5.307481226363886e-05, "train_min_lr": 1.2609123276561697e-06, "train_loss": 2.2205143545390253, "train_loss_scale": 7.186890878168284e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0011568894465837, "val_acc1": 79.80808338743267, "val_acc5": 92.94949724139589, "epoch": 58, "n_parameters": 86534800}
{"train_lr": 5.013310916388107e-05, "train_min_lr": 1.1910255104524853e-06, "train_loss": 2.2143746949375247, "train_loss_scale": 4.6742394484364134e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 0.9912417441255589, "val_acc1": 79.893941970594, "val_acc5": 93.03030537460789, "epoch": 59, "n_parameters": 86534800}
{"train_lr": 4.724261547725411e-05, "train_min_lr": 1.1223552887967514e-06, "train_loss": 2.210244324161444, "train_loss_scale": 5.728845259788663e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0028630472395732, "val_acc1": 79.97474998242927, "val_acc5": 92.88889117616596, "epoch": 60, "n_parameters": 86534800}
{"train_lr": 4.440727927044779e-05, "train_min_lr": 1.0549954579516978e-06, "train_loss": 2.206271047330004, "train_loss_scale": 8.568495269953244e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 5.9021657585336, "val_loss": 0.994865824157993, "val_acc1": 79.9393964513143, "val_acc5": 92.88384070656517, "epoch": 61, "n_parameters": 86534800}
{"train_lr": 4.163097327166961e-05, "train_min_lr": 9.890380233438013e-07, "train_loss": 2.1993736612498442, "train_loss_scale": 9.315920596656583e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 0.9963350900750302, "val_acc1": 79.91414392760306, "val_acc5": 92.79293157866506, "epoch": 62, "n_parameters": 86534800}
{"train_lr": 3.891748958096248e-05, "train_min_lr": 9.245730748949793e-07, "train_loss": 2.1983513060374, "train_loss_scale": 6.705594266275981e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.004168364457822, "val_acc1": 79.82828539992823, "val_acc5": 92.75757804639412, "epoch": 63, "n_parameters": 86534800}
{"train_lr": 3.627053449065398e-05, "train_min_lr": 8.616886639705755e-07, "train_loss": 2.1913310084908533, "train_loss_scale": 1.0640901857630684e+37, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0105812328278228, "val_acc1": 79.77778033747818, "val_acc5": 92.75252753633441, "epoch": 64, "n_parameters": 86534800}
{"train_lr": 3.3693723423005715e-05, "train_min_lr": 8.004706831117988e-07, "train_loss": 2.1877400047771625, "train_loss_scale": 5.451391918815453e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.00147245414915, "val_acc1": 79.85858839439624, "val_acc5": 92.80808316779859, "epoch": 65, "n_parameters": 86534800}
{"train_lr": 3.1190575991981587e-05, "train_min_lr": 7.410027487168331e-07, "train_loss": 2.1835808049772365, "train_loss_scale": 7.30863060941163e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0065340039090833, "val_acc1": 79.69192175778475, "val_acc5": 92.77778010975231, "epoch": 66, "n_parameters": 86534800}
{"train_lr": 2.8764511195878814e-05, "train_min_lr": 6.833660868309007e-07, "train_loss": 2.1786500225051904, "train_loss_scale": 3.7803017707599767e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.009672138075106, "val_acc1": 79.5101035343517, "val_acc5": 92.64646696610885, "epoch": 67, "n_parameters": 86534800}
{"train_lr": 2.641884274738669e-05, "train_min_lr": 6.276394222012444e-07, "train_loss": 2.1752734331372445, "train_loss_scale": 4.0223656549763985e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 6.007812708394088, "val_loss": 1.0166818619647249, "val_acc1": 79.69192174853701, "val_acc5": 92.76767910581647, "epoch": 68, "n_parameters": 86534800}
{"train_lr": 2.4156774547455342e-05, "train_min_lr": 5.738988707486195e-07, "train_loss": 2.1713659615871816, "train_loss_scale": 7.699330212006557e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0118676756468157, "val_acc1": 79.80808331807454, "val_acc5": 92.70707305445815, "epoch": 69, "n_parameters": 86534800}
{"train_lr": 2.1981396309150087e-05, "train_min_lr": 5.222178356020617e-07, "train_loss": 2.1668631372888028, "train_loss_scale": 7.777187016871489e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0194122136035115, "val_acc1": 79.70707330183549, "val_acc5": 92.58081040353485, "epoch": 70, "n_parameters": 86534800}
{"train_lr": 1.9895679337477122e-05, "train_min_lr": 4.726669068390827e-07, "train_loss": 2.165558610020227, "train_loss_scale": 2.661287148110375e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0143761590690068, "val_acc1": 79.76767932545054, "val_acc5": 92.64141650113193, "epoch": 71, "n_parameters": 86534800}
{"train_lr": 1.7902472470936922e-05, "train_min_lr": 4.253137650680783e-07, "train_loss": 2.161363046840468, "train_loss_scale": 2.817000757840237e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": Infinity, "val_loss": 1.0109273165482398, "val_acc1": 79.89899245984627, "val_acc5": 92.81313362815163, "epoch": 72, "n_parameters": 86534800}
{"train_lr": 1.600449819035434e-05, "train_min_lr": 3.802230889847935e-07, "train_loss": 2.161274295796157, "train_loss_scale": 7.528753030438845e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 6.066488486509353, "val_loss": 1.0163066493266302, "val_acc1": 79.85858840826786, "val_acc5": 92.72727503458658, "epoch": 73, "n_parameters": 86534800}
{"train_lr": 1.4204348900296822e-05, "train_min_lr": 3.374564670289736e-07, "train_loss": 2.15227733136485, "train_loss_scale": 6.438757762329808e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0089465808258324, "val_acc1": 79.96969952092026, "val_acc5": 92.78283062559186, "epoch": 74, "n_parameters": 86534800}
{"train_lr": 1.2504483388161602e-05, "train_min_lr": 2.970723132619855e-07, "train_loss": 2.159180669951093, "train_loss_scale": 3.361290602759619e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0172237967667372, "val_acc1": 79.8838409319791, "val_acc5": 92.80303261959192, "epoch": 75, "n_parameters": 86534800}
{"train_lr": 1.0907223465768307e-05, "train_min_lr": 2.591257875802241e-07, "train_loss": 2.153588682381363, "train_loss_scale": 5.690624646491333e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0171830624449945, "val_acc1": 79.82828540802002, "val_acc5": 92.71717405377012, "epoch": 76, "n_parameters": 86534800}
{"train_lr": 9.414750798042784e-06, "train_min_lr": 2.2366872037333235e-07, "train_loss": 2.1504720074934953, "train_loss_scale": 3.9013337128681873e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0125380098315828, "val_acc1": 79.85858840595593, "val_acc5": 92.72222455573804, "epoch": 77, "n_parameters": 86534800}
{"train_lr": 8.029103923125506e-06, "train_min_lr": 1.907495417301246e-07, "train_loss": 2.146065192461744, "train_loss_scale": 7.424000238438755e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0156030270635434, "val_acc1": 79.8434368723089, "val_acc5": 92.69192153468276, "epoch": 78, "n_parameters": 86534800}
{"train_lr": 6.752175467973662e-06, "train_min_lr": 1.6041321528890976e-07, "train_loss": 2.1501207848548636, "train_loss_scale": 1.9506668564340937e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0150650649911899, "val_acc1": 79.79293185031776, "val_acc5": 92.70202254324248, "epoch": 79, "n_parameters": 86534800}
{"train_lr": 5.585709563260262e-06, "train_min_lr": 1.3270117682257005e-07, "train_loss": 2.1439043388913235, "train_loss_scale": 3.260784545570345e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0201438695714238, "val_acc1": 79.77272978464762, "val_acc5": 92.66666902021929, "epoch": 80, "n_parameters": 86534800}
{"train_lr": 4.53129946110162e-06, "train_min_lr": 1.0765127764227876e-07, "train_loss": 2.139216823711658, "train_loss_scale": 3.9721126263817615e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0168515868880899, "val_acc1": 79.83838635646936, "val_acc5": 92.65656801628344, "epoch": 81, "n_parameters": 86534800}
{"train_lr": 3.5903853588664567e-06, "train_min_lr": 8.529773289716289e-08, "train_loss": 2.1397267239102824, "train_loss_scale": 6.614997256978607e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0220926060219822, "val_acc1": 79.66666917858701, "val_acc5": 92.59596195567738, "epoch": 82, "n_parameters": 86534800}
{"train_lr": 2.7642524320395503e-06, "train_min_lr": 6.567107484052485e-08, "train_loss": 2.13937266268574, "train_loss_scale": 7.634921400709205e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0210169901810406, "val_acc1": 79.74242674509684, "val_acc5": 92.60606297810872, "epoch": 83, "n_parameters": 86534800}
{"train_lr": 2.0540290788267346e-06, "train_min_lr": 4.879811112645688e-08, "train_loss": 2.1364489887227105, "train_loss_scale": 4.011041028814227e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": 6.131181861201541, "val_loss": 1.0197052084053484, "val_acc1": 79.77272980083119, "val_acc5": 92.62121450713187, "epoch": 84, "n_parameters": 86534800}
{"train_lr": 1.4606853788985048e-06, "train_min_lr": 3.470188819381008e-08, "train_loss": 2.1423822405437627, "train_loss_scale": 8.428353021196368e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0193877691484017, "val_acc1": 79.77272980545506, "val_acc5": 92.64141650113193, "epoch": 85, "n_parameters": 86534800}
{"train_lr": 9.85031768378016e-07, "train_min_lr": 2.340165978753179e-08, "train_loss": 2.136301658892879, "train_loss_scale": 7.093462712330366e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": Infinity, "val_loss": 1.019599454432438, "val_acc1": 79.76262878533565, "val_acc5": 92.6414165057558, "epoch": 86, "n_parameters": 86534800}
{"train_lr": 6.277179328827328e-07, "train_min_lr": 1.491286066036518e-08, "train_loss": 2.1418644405833986, "train_loss_scale": 8.269808254925962e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0199351372098289, "val_acc1": 79.75757828567967, "val_acc5": 92.6414165057558, "epoch": 87, "n_parameters": 86534800}
{"train_lr": 3.8923192013207655e-07, "train_min_lr": 9.247085490833716e-09, "train_loss": 2.1384670604357576, "train_loss_scale": 5.806702064653595e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0200537619977326, "val_acc1": 79.7626287876476, "val_acc5": 92.64141651500356, "epoch": 88, "n_parameters": 86534800}
{"train_lr": 2.6989947333293037e-07, "train_min_lr": 6.412073046305429e-09, "train_loss": 2.13739729558809, "train_loss_scale": 5.895175706545561e+36, "train_weight_decay": 0.050000000000002244, "train_grad_norm": NaN, "val_loss": 1.0194394502086916, "val_acc1": 79.76262879689534, "val_acc5": 92.65151752356327, "epoch": 89, "n_parameters": 86534800}
{"Final top-1": 83.07738937159021, "Final Top-5": 95.0848656294201}
you are using the distilled vit-b model. The model already has a rather high-level semantic extraction ability. The fine-tuning script you used is for the pre-trained model.
Pre-trained model means the model just finished the pre-train stage and distilled model means the model are distilled from vit-g model under the supervision of the teacher model's logits
All right, I know it. So the claimed 86.6% is achieved through distilling vit-b by vit_g on K400 dataset. I use the distilled model training on downstream task. It will destroy the high level semantic space that vit_g created.
That's right. Or you may try to finetune just one epoch with a small learning rate. It may be beneficial.
OK, thank you for your advice.
Nothing :p
I try to recurrent the accuracy in paper. So I used the pretrained vit-b model and finetuned it on K400 dataset.
However, as training goes on, the training top1 acc continues decrease. I used the training policy provided here: https://github.com/OpenGVLab/VideoMAEv2/blob/master/scripts/finetune/vit_b_k400_ft.sh