antoyang / TubeDETR

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
Apache License 2.0
167 stars 8 forks source link

Purpose of the Mask #21

Closed TalalWasim closed 12 months ago

TalalWasim commented 1 year ago

Hi,

When you load data, you create a mask per image in the video_collate_fn. It is unclear to me what is the purpose of the mask, and what exactly it is used for. Could you clarify that?

Kind regards,

antoyang commented 12 months ago

IIRC this is related to padding, to check which spatial part is padded or not. Not sure this is actually used in practice in the experiments as they were done with 1 video per batch.