Open hanbinhu opened 4 years ago
The current backward timeline using register_backward_hook() will ignore the gradient computation time for the last layer.
The current backward timeline using register_backward_hook() will ignore the gradient computation time for the last layer.