facebookresearch / GDT

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.
Apache License 2.0
44 stars 4 forks source link