adityac94 / Grad_CAM_plus_plus

A generalized gradient-based CNN visualization technique
284 stars 56 forks source link

Doubt in the derivatives from eq7 to eq8 #6

Open pherrusa7 opened 5 years ago

pherrusa7 commented 5 years ago

Dear @adityac94 , @tataiani

If I understand correctly, you use the following assumption to go from eq.7 to eq.8, and from eq.8 to eq.9:

$\frac{\partial^2 Y^C}{\partial A^k{ab} \partial A^k{ij}} = 0, \text{ if } (a,b) \neq (i, j)$ [1]

Can you please provide an explanation for this?

I am also confused about why $\frac{\partial^2 Y^C}{(\partial A^k{ij})^2} \neq 0$ [2] since it seems to me that $\frac{\partial Y^C}{\partial A^k{ij}} = C\text{ (Constant)}$ [3].

Thank you for your time and efforts in advance!

Screenshot 2019-04-11 at 15 25 28

mt-cly commented 4 years ago

Dear @adityac94 , @tataiani

If I understand correctly, you use the following assumption to go from eq.7 to eq.8, and from eq.8 to eq.9:

$\frac{\partial^2 Y^C}{\partial A^k{ab} \partial A^k{ij}} = 0, \text{ if } (a,b) \neq (i, j)$ [1]

Can you please provide an explanation for this?

I am also confused about why $\frac{\partial^2 Y^C}{(\partial A^k{ij})^2} \neq 0$ [2] since it seems to me that $\frac{\partial Y^C}{\partial A^k{ij}} = C\text{ (Constant)}$ [3].

Thank you for your time and efforts in advance!

Screenshot 2019-04-11 at 15 25 28

Hi, althought I am also confusing about motivation of introducing conv_third/second_grad, your fomula do not match with author's code. Actually, the derivative is based on exp(Yc) instead of Yc as mentioned in paper, so that the second_grad or higher grad could be calculated by repeatly multiplying gradient value.

mlerma54 commented 2 years ago

I have a similar issue, if we take partial derivative of (7) respect to A_{ij}^k we do not get (8) unless we assume that the cross-derivatives are zero. However (7) can be seen as an overdetermined system of linear equations with the alphas as unknowns. Since the system has more unknowns than equations it will be in general underdetermined, and will have infinitely many solutions. Assuming that the cross-derivatives are zero imposes additional restrictions on the unknowns and reduces the degrees of freedom in the space of solutions, leading to a possible formula for the alphas. However I do not see any particular reason to assume that the cross-derivatives are zero except for the pragmatical one of getting an equation from which the alphas can be isolated.

That said, we can still notice that the alphas computed may not anymore be solution of the original equation (7), because the derivatives used to get (9) kill linearities. In other words, if we add any linear function of the A_{ij}^k to Y^c the method used in the paper still produces the same alphas, so there is still no guarantee that the alphas obtained actually solve equation (7).