Thanks for the great work!
From the code files for training on gazefollow and video attention target, I see the models are initialized with initial_weights_for_spatial_training.pt/initial_weights_for_temporal_training.pt. I see on your paper that for training on video attention target, you only trained the layers after the encoder, so I think initial_weights_for_temporal_training.pt are the weights after training on gazefollow, is that correct? But I see the spatial model for training on Gazefollow is also initialized with initial_weights_for_spatial_training.pt. How do you get the initial weights for this? Does it contain weights of the pretrained resnet50 for the scene/head branch?
Hello,
Thanks for the great work! From the code files for training on gazefollow and video attention target, I see the models are initialized with initial_weights_for_spatial_training.pt/initial_weights_for_temporal_training.pt. I see on your paper that for training on video attention target, you only trained the layers after the encoder, so I think initial_weights_for_temporal_training.pt are the weights after training on gazefollow, is that correct? But I see the spatial model for training on Gazefollow is also initialized with initial_weights_for_spatial_training.pt. How do you get the initial weights for this? Does it contain weights of the pretrained resnet50 for the scene/head branch?
Thank you very much.