PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
https://arxiv.org/abs/2310.01852
MIT License
549 stars 44 forks source link

GPU sources #4

Closed xiaoaoran closed 7 months ago

xiaoaoran commented 7 months ago

Thanks for the job!

May I know how many GPU sources you used to train the foundation model?

LinB203 commented 7 months ago

You can check GPU sources by this line. Just 8 V100s for depth and infrared, 16 V100 for video and audio. If you use gradient accumulation, then 8 V100s would be fine for video and audio.

xiaoaoran commented 7 months ago

Thanks for the answer!