X-PLUG / Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Apache License 2.0
268 stars 11 forks source link

visual encoder #20

Closed miumiuc closed 4 months ago

miumiuc commented 5 months ago

请问一下,论文中写的是将TimeSformer作为视觉编码器,但是代码中用的是clip_vit_b16.pth?clip的预训练权重能加载到TimeSformer上吗?

auhowielau commented 4 months ago

是的,视觉编码器采用TimeSformer结构,利用clip_vit_b16初始化部分参数(与时序建模无关的参数)

zhiweibi commented 1 month ago

请问这个文件在哪里下载呢

RuixiangZhao commented 3 weeks ago

@zhiweibi 请问你找到了吗?