Chiaraplizz / ST-TR

Spatial Temporal Transformer Network for Skeleton-Based Activity Recognition
MIT License
299 stars 58 forks source link

code question #7

Closed wangzeyu135798 closed 3 years ago

wangzeyu135798 commented 3 years ago

Hi: In spatial_transformer.py line 131 if (self.drop_connect and self.training): mask = torch.bernoulli((0.5) torch.ones(B self.Nh V, device)) mask = mask.reshape(B, self.Nh, V).unsqueeze(2).expand(B, self.Nh, V, V) weights = weights mask Why dose weight multiply mask will drop connect and avoid overfitting?

Chiaraplizz commented 3 years ago

Hi!

You can give a look to this paper https://arxiv.org/abs/1907.11065. The idea is to drop attention weights which you obtained from self-attention, as standard dropout on features does, and thus regularizing the training.

Chiara