PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
https://arxiv.org/abs/2310.01852
MIT License
723 stars 52 forks source link

Video-Language Pre-training hours #56

Open msw6468 opened 5 months ago

msw6468 commented 5 months ago

Thanks for the nice research!

In the case of the Vision-Language Model using the 3M dataset, I am curious about what setting you used for pertaining(which GPUs, how many GPUs) and how much time it took.

Thank you.