JisuHann / One-day-One-paper

Review paper

3 stars 0 forks source link

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning (ACL2020) #15

Open JisuHann opened 3 years ago

JisuHann commented 3 years ago

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

Problem

On video captioning task
- Generating multi-sentence descriptions for videos (visual relevance, discourse-based coherence across the sentence in the paragraph)
- difficulties of having relevant, less redundant, as well as coherent generated sentence
  Goal
more coherent and less repetitive paragraph captions than baseline methods
- build a model that can span over multiple video segments and capture longer range dependencies

In this paper

Memory module for a highly summarized memory state from the video segments and the sentence history
- works as a memory updater that updates its memory state(as a container of the highly summarized video segments and caption history information) using both the current inputs and previous memory state
transformer-based model that uses a shared encoder-decoder architecture augmented with an external memory module