Do I understand correctly that the grand loss at the end will backprop through grad of grad of grad, e.g. not double backward but 20th order backward?
I.e. student_params[5] depends on student_params[4] and grad(loss(target; student_params[4]) and same goes further and we'll have in the computation branch a path that goes through all 5 grad computations
Hi!
Do I understand correctly that the grand loss at the end will backprop through grad of grad of grad, e.g. not double backward but 20th order backward?
I.e. student_params[5] depends on student_params[4] and grad(loss(target; student_params[4]) and same goes further and we'll have in the computation branch a path that goes through all 5 grad computations