Open TAY-985 opened 2 years ago
Video mask is used to indicate the length of video and guide the network to compute attention weights only on unmasked positions. This is required in batch-level training. For example, when we have two videos of different lengths---first one having 3 clip features second one having 5 features--- the mask is shaped as follows:
[ [ 1 1 1 0 0]
[ 1 1 1 1 1] ]
ok ,i see. thank you
hello, May i ask you a question? what is the difference between "video_masks" and "grounding_att_masks", i know the "grounding_att_masks", but i do not understand "video_masks" ? how it is used?