Open khanrc opened 6 years ago
Hi, Please refer to our paper here "https://arxiv.org/pdf/1710.11063.pdf" for detailed explaination for the gradients. In particular Eq. 11, 15 and 16.
The way you suggested won't work because tf.gradient() cumulates all the partial derivatives for a particular input dimension. Ideally tf.gradient(tf.gradient(Y, A), A) should be the hessian of size size(tf.gradient(Y, A)) x size(A). However, you would get a vector of size(A).
Hope that clears things up? Get back if you have any more concernts.
I cannot understand your code for computing derivatives:
My questions are, 1) Why did you multiply exp(cost) ? 2) How the second/triple derivatives are calculated through the code? I think it should be like this: second derivative:
tf.gradient(tf.gradient(Y, A), A)
triple derivative:tf.gradient(tf.gradient(tf.gradient(Y, A), A), A)
Can you help me?