Inconsistent Attention stages

https://github.com/cavalleria/cavaface.pytorch/blob/e9b9bd8ee06de51649ee202712d673f8e64415e9/backbone/resattnet.py#L79 Is it intentional that the stage1 attention block uses addition for the out_trunk whereas the rest use multiplication?

Other repositories that implement this method appear to use multiply here, which makes me believe it is a mistake. However, as this get such good accuracy I'm tempted to ask if there was logic behind it or perhaps the others should be plus as well.

See: https://github.com/MaczekO/AttentionNetworkProject/blob/58ccafc015fbe83dc789d721f5b4fb317b6ebc17/attention_module.py#L250

cavalleria / cavaface

Inconsistent Attention stages #86