TXH-mercury / VAST

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
https://arxiv.org/abs/2305.18500
MIT License
241 stars 17 forks source link

Add images & update README #2

Closed lihanddd closed 1 year ago

lihanddd commented 1 year ago

Add images