TencentARC / ViT-Lens

[CVPR 2024] ViT-Lens: Towards Omni-modal Representations
https://ailab-cvc.github.io/seed/vitlens/
Other
140 stars 9 forks source link

Missing data #16

Closed ShuvenduRoy closed 2 weeks ago

ShuvenduRoy commented 2 weeks ago

Hi,

Thanks for releasing this great project!

I realized that the file audioset_unbalanced_train.json required for training the audio model is missing. Could you please share this file or let me know how to acquire this.

StanLei52 commented 2 weeks ago

Hi,

I've uploaded our copy to release.

To make your own copy, you may download the videos from the list of audioset, and then trim the video and audio according to the annotated timestamps. To enrich the captions, you may refer to CLAP for their used captions.

ShuvenduRoy commented 2 weeks ago

Great. Thanks!