Closed paulhuangkm closed 7 months ago
Hi -
Thanks for your question. One can definitely use a network to compute energy, and then compute gradient to get the Score (e.g., Saremi & Hyvarinen). If the network has no additive offsets then it will be locally linear, as with our network. But if the network is built from convolutions and ReLU nonlinearities, the Score will be discontinuous (derivatives of the rectifiers are step functions), and the derivatives of this Score will be zero or infinite! So, you won't be able to analyze the Score by examining its gradient.
Since our networks are trained directly on denoising, they compute the Score, which is locally linear. The gradients of this are thus the Hessian of the energy. This should be symmetric (this is not guaranteed, but we had shown empirically that it is close).
Thanks for the response!
Just to make sure I'm understanding it correctly, what you are saying is that we won't be able to mathematically analyze the eigenvectors of Jacobian, or Hessian in energy-based models (EBMs), because of the discontinuity of first order. However, empirical results suggest the DNN-estimated locally linear score functions would act approximately the same as ideal denoising operators in the way that their Jacobian matrices are near symmetric. Similar results should also be observed with EBMs since the gradients of EBMs are also locally linear, and their Hessian matrices should also be near symmetric (by empirical results). Is this correct?
Yes if the EBM uses non-linearities whose second derivative are not zero, then similar analysis can be done on their Hessian. In fact, in the case of EBM, the Jacobian would be exactly symmetric since it comes from the gradient of energy.
Oh, you're right. This makes a lot of sense to me. Thanks!
Dear Authors,
Thank you for your work! Your analyses on piecewise linear and DNN denoising operators are interesting. I'm curious about the applicability of your analyses to energy-based models where the score function is estimated via the gradient of the outputs w.r.t. the inputs. Could you shed some light on how similar mathematical frameworks or analyses might be extended or adapted to this context?
Thank you!