JonghwanMun / LGI4temporalgrounding

Repository for the CVPR-20 paper "Local-Global Video-Text Interactions for Temporal Grounding"
129 stars 17 forks source link

"video_masks" and "grounding_att_masks" #17

Open TAY-985 opened 2 years ago

TAY-985 commented 2 years ago

hello, May i ask you a question? what is the difference between "video_masks" and "grounding_att_masks", i know the "grounding_att_masks", but i do not understand "video_masks" ? how it is used?

1636530570(1) 1636530596(1)
JonghwanMun commented 2 years ago

Video mask is used to indicate the length of video and guide the network to compute attention weights only on unmasked positions. This is required in batch-level training. For example, when we have two videos of different lengths---first one having 3 clip features second one having 5 features--- the mask is shaped as follows:

[ [ 1 1 1 0 0]
  [ 1 1 1 1 1] ]
TAY-985 commented 2 years ago

ok ,i see. thank you