训练过程中出错 - Githubissues

Sense-X / UniFormer

[ICLR2022] official implementation of UniFormer

Apache License 2.0

828 stars 111 forks source link

训练过程中出错 #65

Closed LEM0NTE closed 2 years ago

LEM0NTE commented 2 years ago

在训练过程中报如下错误, 我猜测可能是训练集的个别视频出现了问题? 不知道您是否有相应的解决办法(测试时没有问题) 6f112333f3d189158d9ab9867d31d4d

Andy1621 commented 2 years ago

I have not met the bug. You can print the information before calling transform.random_crop. By the way, the code for dataset is forked from PySlowFast, maybe you can find some help from it.

LEM0NTE commented 2 years ago

感谢您的回答。我在训练的cfg文件中发现了 if cfg.UNIFORMER.PRETRAIN_NAME: checkpoint = torch.load(model_path[cfg.UNIFORMER.PRETRAIN_NAME], map_location='cuda:1') 我改变了一些网络结构，删除了一部分需要学习的网络参数。因此如果在yaml中设置 PRETRAIN_NAME: 'uniformer_small_k400_8x8' 会提示例如block.1.conv1.weight的error 所以我想知道这里的checkpoint加载的是网络哪些部分的权重呢。而且我注意到，您的论文中提到把在ImageNet中训练的2D卷积膨胀为3D。我想在保留这些预训练的权重（2D --> 3D）的基础上进行我自己的训练（例如去除部分层），但会像上面所说的那样报错。请问您有什么建议嘛？

Andy1621 commented 2 years ago

You can set strict=False when you use model.load_state_dict()

LEM0NTE commented 2 years ago

You can set strict=False when you use model.load_state_dict()

感谢您。我在UniFormer/video_classification/slowfast/models/build.py line46 42 if cfg.MODEL.ARCH in ['uniformer']: 43 checkpoint = model.get_pretrained_model(cfg) 44 if checkpoint: 45 logger.info('load pretrained model') 46 model.load_state_dict(checkpoint, strict=False) 设置了strict=False。但开始训练后仍提示KeyError: 'blocks1.0.conv1.weight' 请问是设置的文件不对嘛

Andy1621 commented 2 years ago

I think it isn't good to delete layers. It's a more common way to add some layers.

If you really want to delete some layer, you should delete the corresponding keys and values before load_state_dict(). Using strict=False only works for adding more layers.

Andy1621 commented 2 years ago

As there is no more activity, I am closing the issue, don't hesitate to reopen it if necessary.