Debug Hessian Spectra (missing negative eigenvalues)

Resources:

Original issue: #25
Relevant snippet in [Park et al. 2022] regarding hessian computation:

"For Hessian max eigenvalue spectrum (Park & Kim, 2021), 10% of the training dataset is used. We also use power iteration with a batch size of 16 to produce the top-5 largest eigenvalues. To this end, we use the implementation of Yao et al. (2020). We modify the algorithm to calculate the eigenvalues with respect to L2 regularized NLL on augmented training datasets. In the strict sense, the weight decay is not L2 regularization, but we neglect the difference."

Missing negative eigenvalues can be caused by either the model training or the hessian computation.

Thus, we need to validate:

[ ] that the training for our toy comparison replicates [Park et al. 2022] instructions
[ ] that the parameters or our hessian computation are the same

dgcnz / dl2