Open pherrusa7 opened 5 years ago
Also, line 172, do we need to do
positive_gradients = F.relu(score.exp()*gradients) # ReLU(dY/dA) == ReLU(exp(S)*dS/dA))
?
or it is enough to do just
positive_gradients = F.relu(gradients)
?
@yangfantrinity anything which comes after #
is a comment in python.
@yangfantrinity anything which comes after
#
is a comment in python.
@devraj89 Sure of course I know anything after # is comment.
May I suggest you read my question one more time?
Dear @1Konny,
Thanks for your implementation!
I have detected that line 168 in
gradcam.py
:alpha_denom = gradients.pow(2).mul(2) + \ activations.mul(gradients.pow(3)).view(b, k, u*v).sum(-1, keepdim=True).view(b, k, 1, 1)
should be:
global_sum = activations.view(b, k, u*v).sum(-1, keepdim=True).view(b, k, 1, 1) alpha_denom = gradients.pow(2).mul(2) + global_sum.mul(gradients.pow(3))
This is because of Eq. 19 in the paper [@adityac94]. If you pay attention, you need first to compute the sum over all
{a, b}
for each activation mapk
. Then you have a ponderation for each activation mapk
that you will use as a multiplier of all gradients{i, j}
in the respective kernelk
.In your implementation, you first multiply each cell activation
A{a,b,k}
by its respective gradient{i, j, k}
and then you sum over{i, j}
, mixing the indices{a, b}
and{i, j}
, but they are independent.I hope it's clear :)
PD: I have fixed this in
gradcam.py
and added a flag inexample.ipynb
to automatically detect if Cuda can be used, otherwise, use CPU. I will make you a "Pull request".Thanks for your time!