facebookresearch / LaViLa

Code release for "Learning Video Representations from Large Language Models"
MIT License
478 stars 42 forks source link

Base narrator model #33

Closed sarisel closed 6 months ago

sarisel commented 6 months ago

When I try to load the base narrator model vclm_openai_timesformer_base_gpt2_base.pt_ego4d.jobid_319630.ep_0002.md5sum_68a71f.pth using VCLM_OPENAI_TIMESFORMER_BASE_GPT2 class, I get a dimension mismatch error in https://github.com/facebookresearch/LaViLa/blob/8002b5ab0db31789b9897a0a9c36729099e21ad4/lavila/models/timesformer.py#L364 Specifically, RuntimeError: The size of tensor a (883) must match the size of tensor b (785) at non-singleton dimension 1 Are there any changes to the parameters when instantiating the model via VCLM_OPENAI_TIMESFORMER_BASE_GPT2? I used the same values as VCLM_OPENAI_TIMESFORMER_LARGE_336PX_GPT2_XL from demo_narrator.py