UARK-AICV / VLTinT

[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
https://uark-aicv.github.io/VLTinT/
65 stars 6 forks source link