Open jinchiniao opened 1 year ago
It should also be possible to address this limitation by modifying the bash file as follows
python raven/test.py \
data.modality=video \
data/dataset=lrs3 \
experiment_name=vsr_prelrs3vox2_large_ftlrs3vox2_selftrain_lm_test \
model/visual_backbone=resnet_transformer_large \
# model.visual_backbone.ddim=256 \
# model.visual_backbone.dheads=4 \
# model.visual_backbone.dunits=2048 \
# model.visual_backbone.dlayers=6 \
model.visual_backbone.ddim=1024 \
model.visual_backbone.dheads=4 \
model.visual_backbone.dunits=4096 \
model.visual_backbone.dlayers=9 \
model.pretrained_model_path=ckpts/vsr_prelrs3vox2_large_ftlrs3vox2_selftrain.pth \
decode.lm_weight=0.2 \
model.pretrained_lm_path=ckpts/language_model/rnnlm.model.best
Hi, thanks for spotting this!
You are right, the parameters were wrong. In general, the width of the decoder should match that of the encoder, except in the low-resource setting, where the decoder width is smaller to avoid overfitting. I have updated the model configurations as well as the scripts to reflect that.
Please let me know if it still doesn't work.
In scripts/vsr/lrs3_trainval/large_lrs3vox2.sh The following row in the corresponding table contains minor naming errors in the uploaded weights. Large | LRS3+Vox2-en | 32.5 | Download | scripts/vsr/lrs3_trainval/large_lrs3vox2.sh
The uploaded weight name is vsr_lrs3vox2_large_lrs3trainval.pth it should be vsr_prelrs3vox2_large_ftlrs3trainval.pth
Fixed, thank you.
when I run scripts/testing/vsr/lrs3/base_lrs3vox2.sh, error occurs,
size mismatch for decoder.decoders.5.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for decoder.decoders.5.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for decoder.decoders.5.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for decoder.decoders.5.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for decoder.decoders.5.norm3.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for decoder.decoders.5.norm3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for decoder.after_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for decoder.after_norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for decoder.output_layer.weight: copying a param with shape torch.Size([1049, 512]) from checkpoint, the shape in current model is torch.Size([1049, 256]).
maybe you uploaded the wrong version of weights?
when I run scripts/testing/vsr/lrs3/base_lrs3.sh, error occurs,
Error executing job with overrides: ['data.modality=video', 'data/dataset=lrs3', 'experiment_name=vsr_prelrs3vox2_base_ftlrs3_test', 'model/visual_backbone=resnet_transformer_base', 'model.pretrained_model_path=ckpts/vsr_prelrs3vox2_base_ftlrs3.pth']
Traceback (most recent call last):
File "test.py", line 35, in main
learner = Learner(cfg)
File "/home/luosongtao/code/raven/finetune_learner.py", line 25, in __init__
self.model = self.load_model()
File "/home/luosongtao/code/raven/finetune_learner.py", line 42, in load_model
ckpt = torch.load(
File "/home/luosongtao/miniconda3/envs/byolav/lib/python3.8/site-packages/torch/serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/luosongtao/miniconda3/envs/byolav/lib/python3.8/site-packages/torch/serialization.py", line 905, in _legacy_load
return legacy_load(f)
File "/home/luosongtao/miniconda3/envs/byolav/lib/python3.8/site-packages/torch/serialization.py", line 802, in legacy_load
tar.extract('storages', path=tmpdir)
File "/home/luosongtao/miniconda3/envs/byolav/lib/python3.8/tarfile.py", line 2060, in extract
tarinfo = self.getmember(member)
File "/home/luosongtao/miniconda3/envs/byolav/lib/python3.8/tarfile.py", line 1782, in getmember
raise KeyError("filename %r not found" % name)
KeyError: "filename 'storages' not found"
but I run scripts/testing/vsr/lrs3_trainval/base_lrs3.sh and scripts/testing/vsr/lrs3_trainval/base_lrs3vox2.sh successfully. So maybe something wrong with the weights you uploaded?
The high-resource base LRS3+Vox2 weights file was corrupted, but I have now uploaded it again. I also fixed the config for the visual backbone (which was accidentally not pushed to the repo earlier).
Please let me know whether it works or not now :)
The high-resource base LRS3+Vox2 weights file for ASR need authorization(scripts/asr/lrs3/base_lrs3vox2.sh). I think you may have set it up incorrectly.
Some parameters in the configuration file are inconsistent with the provided model parameters. For example, in conf/model/visual_backbone/resnet_transformer_large.yaml (audio backbone may also have a similar issue), there seems to be a mismatch in decoder setting.
It will cause errors when running related scripts: