TXH-mercury / VALOR

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
https://arxiv.org/abs/2304.08345
MIT License
259 stars 16 forks source link

Providing all versions of pretrained weights #21

Closed YingtianDt closed 5 months ago

YingtianDt commented 10 months ago

Hi, could you also provide all versions of the pretrained weights for BERT,CLIP,VideoSwin? And could you explain how different version of these backbone corresponds to the version of the VALOR weights? Thanks!

TXH-mercury commented 5 months ago

pretrained weights for BERT,CLIP,VideoSwin are already provided. VALOR-base use videoswin and bert. VALOR-large use CLIP and bert