KoDohwan / VT-TWINS

Video-Text Representation Learning via Differentiable Weak Temporal Alignment (PyTorch implementation for the CVPR 2022 paper)
10 stars 2 forks source link