facebookresearch / ImageBind

ImageBind One Embedding Space to Bind Them All
Other
8.25k stars 759 forks source link

3 and more modalities in one model #94

Open vzapylikhin opened 1 year ago

vzapylikhin commented 1 year ago

Hello! Your transformer is amazing! But i m beginner in data science. I have to do research for my university task: we want to predict how negotiations will finish. We have various modalities including video, audio, time-series EEG. Maybe you have demo version how to use transformer for such tasks? If you do, please share it. Thanks!

LinB203 commented 11 months ago

hi, here to recommend our work, which is LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment. We provide online demos. we open source all training and validation code.