Hi, when I run the demo code, I notice that the vision_model is first loaded the 'openai_clip_ViT-L-14-336px'.pt in the function 'VCLM_OPENAI_TIMESFORMER_LARGE_336PX_GPT2_XL'. But in 'demo_narrator.py', this part of the parameters were re-covered by the given URL ckpts: 'vclm_openai xxx.pth'.
When inferring, is it unnecessary to load the clip_VIT-L parameters in the function 'VCLM_OPENAI_TIMESFORMER_LARGE_336PX_GPT2_XL'?
Hi, when I run the demo code, I notice that the vision_model is first loaded the
'openai_clip_ViT-L-14-336px'.
pt in the function'VCLM_OPENAI_TIMESFORMER_LARGE_336PX_GPT2_XL'
. But in'demo_narrator.py'
, this part of the parameters were re-covered by the given URL ckpts:'vclm_openai xxx.pth'.
When inferring, is it unnecessary to load the clip_VIT-L parameters in the function 'VCLM_OPENAI_TIMESFORMER_LARGE_336PX_GPT2_XL'?