In table 3, changing attention (mul) to add reduces VAN performance from 75.4 to 74.6. I think this is really huge. However, in the ablation study, you stated that "Besides, replacing attention with adding operation is also not achieving a lower accuracy". Is it okay to say it like that since the performance drop is 0.8
In table 3, changing attention (mul) to add reduces VAN performance from 75.4 to 74.6. I think this is really huge. However, in the ablation study, you stated that "Besides, replacing attention with adding operation is also not achieving a lower accuracy". Is it okay to say it like that since the performance drop is 0.8
Can't treat add as a type of attention function? In Attention Mechanisms in Computer Vision: A Survey, we have the formula: I can treat function f here is an addition operation can't I?