TolgaOk / Differentiable-Hard-Attention-Module

A modified version of Spatial Transformer Networks to attend regions on the input image.
44 stars 6 forks source link

ask for help #1

Open 22wei22 opened 6 years ago

22wei22 commented 6 years ago

I understand how to get mean_x,mean_y. but I do not understand why scale = ((difference_x.pow(2) + difference_y.pow(2)).sqrt()*softmaxed_map).sum(-1).sum(-1).

what does code mean?

TolgaOk commented 6 years ago

This can be interpreted as expected L2 norm. codecogseqn In which the probability distribution is represented by softmax (in 2D). This is also different than variance or standard deviation.

22wei22 commented 6 years ago

Thanks. Have you completed the code of Hard-attention that can automatically crop interested region of images.it is not differentiable. I can not find reference.

TolgaOk commented 6 years ago

This code is based on Spatial Transformer Networks. They suggested a differentiable transformation given the necessary transformation parameters. So that gradient can flow through transformation parameters. In this work, I used this differentiable transformation and transformation parameters which are obtained via expectations(mean and expected L2) to crop the interested region (via transforming the input image). This is why it is differentiable. You can apply the dham module on top of any architecture. In the example, I used it for a classification task and the only supervision was the label information.

22wei22 commented 6 years ago

Thanks,if I want the image produce T interested regions.I should train T*(mean_x,mean_y,scale) .Am I right?

------------------ 原始邮件 ------------------ 发件人: "Tolga Ok"notifications@github.com; 发送时间: 2018年4月23日(星期一) 下午5:15 收件人: "TolgaOk/Differentiable-Hard-Attention-Module"Differentiable-Hard-Attention-Module@noreply.github.com; 抄送: "快乐男孩"1334637558@qq.com; "Author"author@noreply.github.com; 主题: Re: [TolgaOk/Differentiable-Hard-Attention-Module] ask for help (#1)

This code is based on Spatial Transformer Networks. They suggested a differentiable transformation given the necessary transformation parameters. So that gradient can flow through transformation parameters. In this work, I used this differentiable transformation and transformation parameters which are obtained via expectations(mean and expected L2) to crop the interested region (via transforming the input image). This is why it is differentiable. You can apply the dham module on top of any architecture. In the example, I used it for a classification task and the only supervision was the label information.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

TolgaOk commented 6 years ago

Yes, you are right. For example, if you want to get more than one region at each forward pass you can feed Dham layer with multi channel feature-map. Allthough I haven't tried it, you can use a convolutional layer with output size greater than 1 before feeding it to dham module.

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv_att_1 = nn.Conv2d(1, 48, kernel_size=5)
        self.conv_att_2 = nn.Conv2d(48, 1, kernel_size=5)
        self.batchnorm2d_att = nn.BatchNorm2d(1)

        self.attention = Dham((28, 28))

        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

You can change self.conv_att_2 = nn.Conv2d(48, 1, kernel_size=5) to self.conv_att_2 = nn.Conv2d(48, x, kernel_size=5), where x represents number of reigons.