Open 22wei22 opened 6 years ago
This can be interpreted as expected L2 norm. In which the probability distribution is represented by softmax (in 2D). This is also different than variance or standard deviation.
Thanks. Have you completed the code of Hard-attention that can automatically crop interested region of images.it is not differentiable. I can not find reference.
This code is based on Spatial Transformer Networks. They suggested a differentiable transformation given the necessary transformation parameters. So that gradient can flow through transformation parameters. In this work, I used this differentiable transformation and transformation parameters which are obtained via expectations(mean and expected L2) to crop the interested region (via transforming the input image). This is why it is differentiable. You can apply the dham module on top of any architecture. In the example, I used it for a classification task and the only supervision was the label information.
Thanks,if I want the image produce T interested regions.I should train T*(mean_x,mean_y,scale) .Am I right?
------------------ 原始邮件 ------------------ 发件人: "Tolga Ok"notifications@github.com; 发送时间: 2018年4月23日(星期一) 下午5:15 收件人: "TolgaOk/Differentiable-Hard-Attention-Module"Differentiable-Hard-Attention-Module@noreply.github.com; 抄送: "快乐男孩"1334637558@qq.com; "Author"author@noreply.github.com; 主题: Re: [TolgaOk/Differentiable-Hard-Attention-Module] ask for help (#1)
This code is based on Spatial Transformer Networks. They suggested a differentiable transformation given the necessary transformation parameters. So that gradient can flow through transformation parameters. In this work, I used this differentiable transformation and transformation parameters which are obtained via expectations(mean and expected L2) to crop the interested region (via transforming the input image). This is why it is differentiable. You can apply the dham module on top of any architecture. In the example, I used it for a classification task and the only supervision was the label information.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Yes, you are right. For example, if you want to get more than one region at each forward pass you can feed Dham layer with multi channel feature-map. Allthough I haven't tried it, you can use a convolutional layer with output size greater than 1 before feeding it to dham module.
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv_att_1 = nn.Conv2d(1, 48, kernel_size=5)
self.conv_att_2 = nn.Conv2d(48, 1, kernel_size=5)
self.batchnorm2d_att = nn.BatchNorm2d(1)
self.attention = Dham((28, 28))
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
You can change self.conv_att_2 = nn.Conv2d(48, 1, kernel_size=5)
to self.conv_att_2 = nn.Conv2d(48, x, kernel_size=5)
, where x represents number of reigons.
I understand how to get mean_x,mean_y. but I do not understand why scale = ((difference_x.pow(2) + difference_y.pow(2)).sqrt()*softmaxed_map).sum(-1).sum(-1).
what does code mean?