"video_masks" and "grounding_att_masks"

JonghwanMun / LGI4temporalgrounding

Repository for the CVPR-20 paper "Local-Global Video-Text Interactions for Temporal Grounding"

129 stars 17 forks source link

"video_masks" and "grounding_att_masks" #17

Open TAY-985 opened 2 years ago

TAY-985 commented 2 years ago

hello, May i ask you a question? what is the difference between "video_masks" and "grounding_att_masks", i know the "grounding_att_masks", but i do not understand "video_masks" ? how it is used?

JonghwanMun commented 2 years ago

Video mask is used to indicate the length of video and guide the network to compute attention weights only on unmasked positions. This is required in batch-level training. For example, when we have two videos of different lengths---first one having 3 clip features second one having 5 features--- the mask is shaped as follows:

[ [ 1 1 1 0 0]
  [ 1 1 1 1 1] ]

TAY-985 commented 2 years ago

ok ,i see. thank you