Open Antoninnnn opened 8 months ago
The reason is that, as you mentioned, we need to take the gradient of the regressor with respect ot the noisy graph. We cannot differentiate through the extra features, so we prefer the regressor not to rely on them.
Is it because the effectiveness of the gradient $\nabla{G{t}}||y'-y||^{2}$? (We notice that
extra_features
are commonly used in the unconditional denoising model.)