Attention vs Add in LKA

In table 3, changing attention (mul) to add reduces VAN performance from 75.4 to 74.6. I think this is really huge. However, in the ablation study, you stated that "Besides, replacing attention with adding operation is also not achieving a lower accuracy". Is it okay to say it like that since the performance drop is 0.8

Can't treat add as a type of attention function? In Attention Mechanisms in Computer Vision: A Survey, we have the formula: I can treat function f here is an addition operation can't I?

Visual-Attention-Network / VAN-Classification

Attention vs Add in LKA #32