Closed kinredon closed 3 years ago
It seems that the performance is insenstitive to this extra forward pass
In my view, the model BN layer parameters should be updated first using entropy minimization, and then making predictions will achieve better performance. However, this implement utilizing the output of the first forward makes me confused.
Please see the published edition of the paper at ICLR'21, where we have updated the method regarding the number of forward passes:
In further experiments we found that the results are insensitive to re-forwarding after the update. In practice, tent often only requires a few updates to adapt to the shifts in our experiments, and so repeating inference is not necessary. The update on the last batch still improves prediction on the next batch. Note that this shows the adaptation learned by tent generalizes across target points, as it makes the prediction before taking the gradient, and so its improvement is not specific to each test batch (see this review comment for more discussion).
Is the right order: forward, backward, and forward?
If you want to include the final forward, to have the most-up-to-date predictions with respect to entropy minimization, then you can simply add outputs = self.model(x)
after the forward and adapt loop:
https://github.com/DequanWang/tent/blob/master/tent.py#L30-L31
Thank you for your question about adaptation with and without repeating inference!
Thank you! This helps me a lot.
As the paper said, TNET needs 2× the inference time plus 1× the gradient time per test point, but I found there is only one forward and one gradient update in the code https://github.com/DequanWang/tent/blob/03ac55c4fef0fda3eacb2f6414b099031e96d003/tent.py#L49
Is the right order: forward, backward, and forward?