OpenGVLab / VideoMAEv2

[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
https://arxiv.org/abs/2303.16727
MIT License
527 stars 63 forks source link

你好!请问可以提供ViT-base蒸馏模型finetune的script或者提供ViT-base的普通模型吗?非常感谢!!!我的邮箱是2256380854@qq.com #13

Closed DragonWang-cell closed 1 year ago

congee524 commented 1 year ago

什么是蒸馏模型 finetune 的 script?我们放出的蒸馏模型是没有经过 finetune,logits 蒸馏完直接测结果。

什么是普通模型?

DragonWang-cell commented 1 year ago

你好!“ViT-base的普通模型” 我的意思是ViT-base的Fine-tuned Model相应的Checkpoint,我看scripts文件夹里的finetune文件夹里面是有vit_b_k400_ft.sh的,因为我的卡显存比较小跑不动ViT-g,就想先用ViT-b的Checkpoint复现一下结果,但是看到finetune的结果用到的确实都是ViT-g的预训练模型,不知道你们有没有ViT-base的Fine-tuned Model相应的Checkpoint,如果有的话非常感谢你们能提供一下,没有就不用啦

醉一场繁华 @.***

 

------------------ 原始邮件 ------------------ 发件人: "OpenGVLab/VideoMAEv2" @.>; 发送时间: 2023年5月14日(星期天) 下午5:03 @.>; @.**@.>; 主题: Re: [OpenGVLab/VideoMAEv2] @.*** (Issue #13)

什么是蒸馏模型 finetune 的 script?我们放出的蒸馏模型是没有经过 finetune,logits 蒸馏完直接测结果。

什么是普通模型?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

congee524 commented 1 year ago

用这个页面放出来的模型就行,这个就是我们训出来的。如果你要复现的话,你可以用 vit_b_hybrid_pt_800e.pth 这个模型来 finetune

congee524 commented 1 year ago

另外我们在 Model_Zoo 里面放了蒸馏模型,我感觉你一直在要蒸馏模型,蒸馏的超参在论文里的补充材料里写了

DragonWang-cell commented 1 year ago

你好,我在VideoMAE用vit_b_hybrid_pt_800e.pth做finetune的结果是和论文里面差不多的,但是在VideoMAEv2用vit_b_hybrid_pt_800e.pth做finetune了,但是效果很差,log如下,我觉得是V2版本给的vit-b的script的超参数和vit_b_hybrid_pt_800e.pth模型不对应的: {"train_lr": 8.749044342507643e-06, "train_min_lr": 2.0785335635065734e-07, "train_loss": 1.5748268492575799, "train_loss_scale": 5640.457610921501, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 9.221733793788527, "val_loss": 1.2811668501552014, "val_acc1": 72.51568748274933, "val_acc5": 91.0291462581521, "epoch": 0, "n_parameters": 86534800} {"train_lr": 2.6249522171253826e-05, "train_min_lr": 6.236168285703647e-07, "train_loss": 1.550476075870592, "train_loss_scale": 6295.25843003413, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 8.190266947837223, "val_loss": 1.6712491185883585, "val_acc1": 64.56183182462172, "val_acc5": 88.33738355514998, "epoch": 1, "n_parameters": 86534800} {"train_lr": 4.374999999999999e-05, "train_min_lr": 1.0393803007900715e-06, "train_loss": 1.4913980040717043, "train_loss_scale": 6515.296109215017, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 6.768115112599909, "val_loss": 2.1607980474974346, "val_acc1": 56.117184699196926, "val_acc5": 84.43129178749552, "epoch": 2, "n_parameters": 86534800} {"train_lr": 6.125047782874617e-05, "train_min_lr": 1.4551437730097783e-06, "train_loss": 1.4430794694663722, "train_loss_scale": 6867.580068259385, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 5.779042058854272, "val_loss": 2.831061159269613, "val_acc1": 44.378669823888, "val_acc5": 77.79801914745487, "epoch": 3, "n_parameters": 86534800} {"train_lr": 7.875095565749237e-05, "train_min_lr": 1.8709072452294852e-06, "train_loss": 1.4447710629636517, "train_loss_scale": 7017.999726962457, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 5.076708192284075, "val_loss": 3.5639363504873898, "val_acc1": 32.113945795972924, "val_acc5": 70.38555186259508, "epoch": 4, "n_parameters": 86534800} {"train_lr": 8.749005627194589e-05, "train_min_lr": 2.078524365807435e-06, "train_loss": 1.4738024167151988, "train_loss_scale": 6922.379795221843, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 4.834832134648223, "val_loss": 4.572349728383486, "val_acc1": 15.892532605725343, "val_acc5": 58.95567906489138, "epoch": 5, "n_parameters": 86534800} {"train_lr": 8.743040205704365e-05, "train_min_lr": 2.077107144874224e-06, "train_loss": 1.4769724612085486, "train_loss_scale": 6501.596177474403, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 4.794250246893977, "val_loss": 4.958667535515665, "val_acc1": 10.17506633589466, "val_acc5": 54.54361649810291, "epoch": 6, "n_parameters": 86534800} {"train_lr": 8.731117103455707e-05, "train_min_lr": 2.074274541993863e-06, "train_loss": 1.4580995743917524, "train_loss_scale": 6391.717133105802, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 4.6958389474254325, "val_loss": 5.492337100714156, "val_acc1": 5.585914117318612, "val_acc5": 48.84638905134357, "epoch": 7, "n_parameters": 86534800} {"train_lr": 8.713252605972045e-05, "train_min_lr": 2.070030426161161e-06, "train_loss": 1.44284408966429, "train_loss_scale": 6637.477133105802, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 4.614166949011556, "val_loss": 5.31722738470846, "val_acc1": 5.1457198154007076, "val_acc5": 45.08196873250844, "epoch": 8, "n_parameters": 86534800} {"train_lr": 8.689471114008089e-05, "train_min_lr": 2.064380594327843e-06, "train_loss": 1.4253422912192426, "train_loss_scale": 6717.44, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 4.577624832147741, "val_loss": 5.8473647157254725, "val_acc1": 2.6563450339061965, "val_acc5": 42.27383269754119, "epoch": 9, "n_parameters": 86534800} {"train_lr": 8.65980511022123e-05, "train_min_lr": 2.0573327634845983e-06, "train_loss": 1.429575042926004, "train_loss_scale": 6874.29023890785, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 4.702856262394109, "val_loss": 5.656165260857748, "val_acc1": 3.005464673114677, "val_acc5": 40.34102553487475, "epoch": 10, "n_parameters": 86534800} {"train_lr": 8.624295114804348e-05, "train_min_lr": 2.0488965601206202e-06, "train_loss": 1.4149990657592388, "train_loss_scale": 6682.49119453925, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 4.722320101323882, "val_loss": 4.951523631087785, "val_acc1": 8.409229360787739, "val_acc5": 50.89556950428447, "epoch": 11, "n_parameters": 86534800} {"train_lr": 8.582989630139748e-05, "train_min_lr": 2.0390835070748973e-06, "train_loss": 1.432228427543575, "train_loss_scale": 6737.290921501706, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 4.911686852081948, "val_loss": 5.816137488533571, "val_acc1": 1.8316131503783215, "val_acc5": 32.80206560321338, "epoch": 12, "n_parameters": 86534800} {"train_lr": 8.535945074551012e-05, "train_min_lr": 2.0279070077975626e-06, "train_loss": 1.4392458312706735, "train_loss_scale": 6448.473993174061, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 4.98053649354391, "val_loss": 5.770020530061815, "val_acc1": 2.4235986143296318, "val_acc5": 39.36450259585632, "epoch": 13, "n_parameters": 86534800} {"train_lr": 8.483225705242246e-05, "train_min_lr": 2.0153823280422234e-06, "train_loss": 1.442067492415067, "train_loss_scale": 6133.655153583618, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 5.109620900643487, "val_loss": 6.017422503855043, "val_acc1": 1.8872698950058195, "val_acc5": 35.82270924510851, "epoch": 14, "n_parameters": 86534800} {"train_lr": 8.42490353053055e-05, "train_min_lr": 2.0015265750149016e-06, "train_loss": 1.4501616932995898, "train_loss_scale": 6059.004505119454, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 5.231560899680896, "val_loss": 5.749083195134853, "val_acc1": 1.7101802293213484, "val_acc5": 40.366324014206256, "epoch": 15, "n_parameters": 86534800} {"train_lr": 8.361058211491307e-05, "train_min_lr": 1.986358674007578e-06, "train_loss": 1.4557413590768091, "train_loss_scale": 5861.893242320819, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 5.3330039641638365, "val_loss": 6.593494969953611, "val_acc1": 0.8500304045500578, "val_acc5": 27.95992826011301, "epoch": 16, "n_parameters": 86534800} {"train_lr": 8.29177695315054e-05, "train_min_lr": 1.9698993425485258e-06, "train_loss": 1.448337069064277, "train_loss_scale": 5277.828805460751, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 5.368583835992109, "val_loss": 5.9355190679864975, "val_acc1": 1.8670310846732903, "val_acc5": 35.61020164976137, "epoch": 17, "n_parameters": 86534800} {"train_lr": 8.217154385373605e-05, "train_min_lr": 1.9521710621046996e-06, "train_loss": 1.4658520119629623, "train_loss_scale": 5414.268941979522, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 5.613243079729337, "val_loss": 5.7612327824807865, "val_acc1": 1.9429266570044346, "val_acc5": 37.517710353941645, "epoch": 18, "n_parameters": 86534800} {"train_lr": 8.137292433611968e-05, "train_min_lr": 1.9331980473747233e-06, "train_loss": 1.4653847824774504, "train_loss_scale": 4867.669624573379, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 5.727510937992014, "val_loss": 6.736977968473457, "val_acc1": 0.8652095180898272, "val_acc5": 23.74013457909163, "epoch": 19, "n_parameters": 86534800} {"train_lr": 8.052300179685534e-05, "train_min_lr": 1.913006213214518e-06, "train_loss": 1.4769458882515747, "train_loss_scale": 4767.017064846416, "train_weight_decay": 0.05000000000000669, "train_grad_norm": 5.870866237140123, "val_loss": 7.023920476364279, "val_acc1": 0.7842542431755506, "val_acc5": 20.881401460389334, "epoch": 20, "n_parameters": 86534800}

醉一场繁华 @.***

 

------------------ 原始邮件 ------------------ 发件人: "OpenGVLab/VideoMAEv2" @.>; 发送时间: 2023年5月14日(星期天) 下午5:21 @.>; @.**@.>; 主题: Re: [OpenGVLab/VideoMAEv2] @.*** (Issue #13)

用这个页面放出来的模型就行,这个就是我们训出来的。如果你要复现的话,你可以用 vit_b_hybrid_pt_800e.pth 这个模型来 finetune

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

congee524 commented 1 year ago

我建议是仔细检查一下超参、代码、Load 的模型之类的,我都不知道怎么回复

这个 log 非常离奇,要么是我放的代码有什么 bug,要么是你犯了一些低级错误,我现在更倾向于后者,如果有时间我会再检查下代码(或者等其他人提出类似的问题),感谢理解!

DragonWang-cell commented 1 year ago

感谢你的回答!我用pretrain 完的模型做finetune的第一个 epoch 精度是百分之0.24,我前面提供的log是用finetune那一列给的模型做的,我感觉可能是模型没有适配vit-b finetune脚本里面设置的超参数导致的,主要是我没仔细研究过蒸馏,直接用script里面vit-b finetune脚本执行主函数里面的final_test那一步测得的结果是和论文里面差不多的(此时是用finetune那一列给的模型,不是pretrain完的模型),但是用finetune那一列给的模型对vit-b的script脚本做75轮的训练,就得到了我上面的Log,我按照论文里的补充材料的蒸馏超参改写的vit-b finetune脚本得到的结果也是和前面离谱的Log一样的结果,我做的这个实验是不是表示我不能直接使用蒸馏模型https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/distill/vit_b_k710_dl_from_giant.pth 作为预训练模型,来对其他可能改进的方法做finetune呢?如果可以的话,我应该使用什么样的finetune脚本呢

congee524 commented 1 year ago

不管是蒸馏模型还是 finetune 后的模型,都已经直接或间接得被 hard label 监督过了,你再训这么多肯定会过拟合的。k710 中已经包含了大部分的 k400 训练数据, 你一定要迁移过去一般训 1~2 个 epoch 就足够(一般来说没必要再训了,会降低泛化能力),而且 learning rate 要调小。

DragonWang-cell commented 1 year ago

感谢你的回答!