cmhungsteve / Awesome-Transformer-Attention

An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
4.66k stars 490 forks source link

Requesting to add four papers published in 2022 and 2023 #66

Closed ShramanPramanick closed 9 months ago

ShramanPramanick commented 10 months ago

Please add the following four papers which use transformer backbones:

  1. Egocentric Video-language pre-training and solves video-text retrieval, video classification, text-guided video grounding, text-guided video summarization, video question-answering etc.
  1. Image-language pre-training and solves image captioning, image-text retrieval, object detection, segmentation, referring expression comprehension.
  1. Video temporal grounding, unifying diverse temporal annotations to power moment retrieval (interval), highlight detection (curve) and video summarization (point).
cmhungsteve commented 9 months ago

Thank you for sharing, @ShramanPramanick. I have updated the repo with the papers above.