ejcgt / attention-target-detection

[CVPR2020] "Detecting Attended Visual Targets in Video"
MIT License
176 stars 48 forks source link

How are the initial weights for training obtained? #11

Closed qiaomu-miao closed 2 years ago

qiaomu-miao commented 3 years ago

Hello,

Thanks for the great work! From the code files for training on gazefollow and video attention target, I see the models are initialized with initial_weights_for_spatial_training.pt/initial_weights_for_temporal_training.pt. I see on your paper that for training on video attention target, you only trained the layers after the encoder, so I think initial_weights_for_temporal_training.pt are the weights after training on gazefollow, is that correct? But I see the spatial model for training on Gazefollow is also initialized with initial_weights_for_spatial_training.pt. How do you get the initial weights for this? Does it contain weights of the pretrained resnet50 for the scene/head branch?

Thank you very much.