Closed IvanFei closed 3 years ago
Hi,
During the second forward period, the newly calculated gradients will be accumulated with the gradients calculated in the first forward period instead of being refreshed, respectively.
hi,
thank you for your reply. You are correct. I misunderstood the forward mechanism of PyTorch.
When you compute the forward pass using different inputs, each output will have its own computation graph attached to it in pytorch. It’s not overwritten by the next call.
best wishes
hi, @backseason: I found that joint training used the schedule contained two forward and one backward. It seemed that nework would not correctly calculate the grad of edge_image loss. It means that the second forward would refresh network status from first forward, but the forward status is import for calucalte grad of network [ref chain rule as followed].