JianqiangRen / AAMS

Official repository of the paper 'Attention-aware Multi-stroke Style Transfer' in CVPR 2019
https://sites.google.com/view/yuanyao/attention-aware-multi-stroke-style-transfer
MIT License
79 stars 14 forks source link

About the attention map #5

Closed dehezhang2 closed 3 years ago

dehezhang2 commented 4 years ago

Thanks for your excellent work! I want to ask about the filtered attention map.

According to the 3.3 Multi-stroke Fusion section in the paper, the residual learns to add variation to the original content feature map. Therefore, the zero parts of the attention map corresponds to the curial parts of the image (i.e. what should be preserved as the basic feature) after filtering. However, as shown in the test result (says the content image "bird.jpg"), the area corresponds to bird has a higher filtered attention value than the background.

Screenshot 2020-07-07 at 11 55 27 AM

Is this a contradiction to the statement in the paper? I mean the bird part that should be preserved will be added more variation than the background part.

JianqiangRen commented 4 years ago

Thank you for your interest in our work ! I guess you have misunderstood our statement in 3.3——"therefore the non-trivial (zero) parts of the residual deserve special attention". The parenthesis here means trivial is zero, so non-trival parts are important. During reconstruction, the residual learns to enhance intrinsically crucial part of content feature by assigning large value to abs(Ax).

dehezhang2 commented 4 years ago

Thank you for your interest in our work ! I guess you have misunderstood our statement in 3.3——"therefore the non-trivial (zero) parts of the residual deserve special attention". The parenthesis here means trivial is zero, so non-trival parts are important. During reconstruction, the residual learns to enhance intrinsically crucial part of content feature by assigning large value to abs(Ax).

Thanks for replying. Then may I ask about the usage of 'abs'? According to the formula O_x = A*f_x + f_x in page 4 of the paper, if the attention value is negative (says -1), the residual will be the negative value of the corresponding feature map, which will result in a zero output feature map. If the attention value is positive (says 1), the feature map will be doubled. I think the features corresponds to 1 and -1 attention value have different importance. So why is 'abs' used for the attention filter, it gives features with 1 and -1 attention value the same importance.

I also tried to generate attention map by myself (the attention map before filtering). If I assign -1 to the attention map, the reconstructed image is changed a lot.

Screenshot 2020-07-07 at 6 43 02 PM

If I assign 1 to the attention map, the reconstructed image will be similar to the input.

Screenshot 2020-07-07 at 6 43 28 PM

So why is it reasonable to use abs and generate the filtered attention map?

JianqiangRen commented 4 years ago

the self-attention module enhanced network's ability of feature extraction by leading changes(residual) to the origin feature maps of mid layers, so that position with large value change means there is paid more attention by self-attention module. and
change should be formulated as absolute value, isn‘t it? for example, for both (x+1) and (x-1), x has changed by abs(-1).