Closed valencebond closed 3 years ago
@valencebond We are sorry to have uploaded an older version of the code, which included the pure soft margin loss. In the latest version, we apply a normed soft margin loss, which can be viewed as a variant of the cosine distance and is more in line with our purpose. We have updated the code. Also, we would like to share the observation from experiment results that after considering the influence of ramdomly sampled tasks and training, these two version of the loss would not significantly affect the performance of the model.
Much thanks for your attention. If you have any additional questions, welcome to reopen the issue.
Thanks for your detailed codes. But the attention alignment loss makes me confused. According to equations 8 and 9 in the paper, Ms and Mg are distributed in [0, 1] after the sigmoid function. Maximizing element-wise multiplication between M{c}^{ag} and M{c}^{sg} will minimize the the loss{i}^{cas}. Thus, the M{c}^{ag} and M{c}^{sg} are both inclined to become 1. This will not make M{c}^{sg} align with M{c}^{ag}, even if M{c}^{ag} are fixed as targets.
In my opinion, the alignment loss makes spatial attention (channel attention) of the self-attention branch attend to all spatial pixels(channel items) by making M{c}^{sg} and M{s}^{sg} equal to 1