Soldelli / MAD

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
MIT License
147 stars 3 forks source link