About the code - Githubissues

haolin512900 commented 3 years ago

Thank you for your great poject! In your code model/Attention.py, how did the idea of using prop_kernels as the convolution kernel for conv_results in this class GlobalLocalAttention, or where it came from?and why you want to do this? I am lookforward your reply,good lucky to you !

SayedNadim commented 3 years ago

Hi, Sorry for the late reply. Using patches as convolution kernels for calculating the inner product similarities among patches has been used in previous studies, most prominently in 'generative image inpainting with contextual attention', on which, the attention mechanism is based. This operation essentially calculates the most similar patches for feature reconstruction. To understand the concept, you can think of a small patch convoluting with itself and other patches. As the values are similar in the case of convoluting with itself, the resultant value of the convolution will be bigger than convolution with other patches. Now, if we calculate the softmax of the whole feature map after convolution, the self-convoluted patch will have a higher class probability than the other patches. So, when we perform this operation for all patches in a feature map, we get the patches that have higher similarity values after performing softmax, as we are only calculating the most similar patches through this operation, which is essentially the same as calculating the inner product similarity. We want to have an explicit local attention mechanism as we need texture consistency between the inpainted region and background region. However, we also want to remove the mask value's influence while calculating the local similarity. That's why we have incorporated the mask pruning before calculating the local similarity. Hope this helps! Please let me know if you have any more questions. Cheers!

haolin512900 commented 3 years ago

Hello! Please forgive my ignorance, so my understanding is that the function of this prop_kernel is to calculate the most similar patch? Why can it play such a role? Why are all its elements 1?

------------------ 原始邮件 ------------------ 发件人: "SayedNadim/Global-and-Local-Attention-Based-Free-Form-Image-Inpainting" <notifications@github.com>; 发送时间: 2020年12月22日(星期二) 晚上8:53 收件人: "SayedNadim/Global-and-Local-Attention-Based-Free-Form-Image-Inpainting"<Global-and-Local-Attention-Based-Free-Form-Image-Inpainting@noreply.github.com>; 抄送: "1250404756"<1250404756@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [SayedNadim/Global-and-Local-Attention-Based-Free-Form-Image-Inpainting] About the code (#16)

Hi, Sorry for the late reply. Using patches as convolution kernels for calculating the inner product similarities among patches has been used in previous studies, most prominently in 'generative image inpainting with contextual attention', on which, the attention mechanism is based. This operation essentially calculates the most similar patches for feature reconstruction. To understand the concept, you can think of a small patch convoluting with itself and other patches. As the values are similar in the case of convoluting with itself, the resultant value of the convolution will be bigger than convolution with other patches. Now, if we calculate the softmax of the whole feature map after convolution, the self-convoluted patch will have a higher class probability than the other patches. So, when we perform this operation for all patches in a feature map, we get the patches that have higher similarity values after performing softmax, as we are only calculating the most similar patches through this operation, which is essentially the same as calculating the inner product similarity. We want to have an explicit local attention mechanism as we need texture consistency between the inpainted region and background region. However, we also want to remove the mask value's influence while calculating the local similarity. That's why we have incorporated the mask pruning before calculating the local similarity. Hope this helps! Please let me know if you have any more questions. Cheers!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

SayedNadim commented 3 years ago

Hi, I am sorry as I misunderstood the question. This is embarrassing....

So, the prop_kernels are basically used to have attention propagation, which is based on the baseline paper. Quoting from the baseline paper (i.e. generative image inpainting with contextual attention),

We further encourage coherency of attention by propagation (fusion). The idea of coherency is that a shift in foreground patch is likely corresponding to an equal shift in background patch for attention.... To model and encourage coherency of attention maps, we do a left-right propagation followed by a top-down propagation with kernel size of k..... The propagation is efficiently implemented as convolution with identity matrix as kernels.

Let me know if you have more questions. Cheers!

haolin512900 commented 3 years ago

Thank you for you reply,Very happy for your patient answer，If there are any questions, I will continue to consult you. Good luck to you!

------------------ 原始邮件 ------------------ 发件人: "SayedNadim/Global-and-Local-Attention-Based-Free-Form-Image-Inpainting" <notifications@github.com>; 发送时间: 2020年12月24日(星期四) 下午3:31 收件人: "SayedNadim/Global-and-Local-Attention-Based-Free-Form-Image-Inpainting"<Global-and-Local-Attention-Based-Free-Form-Image-Inpainting@noreply.github.com>; 抄送: "1250404756"<1250404756@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [SayedNadim/Global-and-Local-Attention-Based-Free-Form-Image-Inpainting] About the code (#16)

Hi, I am sorry as I misunderstood the question. This is embarrassing....

So, the prop_kernels are basically used to have attention propagation, which is based on the baseline paper. Quoting from the baseline paper (i.e. generative image inpainting with contextual attention),

We further encourage coherency of attention by propagation (fusion). The idea of coherency is that a shift in foreground patch is likely corresponding to an equal shift in background patch for attention.... To model and encourage coherency of attention maps, we do a left-right propagation followed by a top-down propagation with kernel size of k..... The propagation is efficiently implemented as convolution with identity matrix as kernels.

Let me know if you have more questions. Cheers!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

SayedNadim commented 3 years ago

Thank you for showing interest in our project. Please close the issue if you feel resolved. Cheers!

SayedNadim / Global-and-Local-Attention-Based-Free-Form-Image-Inpainting

About the code #16