Closed ayush1997 closed 6 years ago
Hi, @ayush1997
That function is to get the estimated inverse hessian vector.
test_grad_loss
is the initial value to get one of inverse hessian vector.
cur_estimate
is the updated inverse hessian vector recursively.
If you have 1,000,000
training set, and set the approx param as:
approx_params = {
'num_repeats': 1,
'recursion_depth': 500,
'recursion_batch_size': 100
}
You will get the inverse hessian vector from 500 * 100
training samples.
If you set the approx param as:
approx_params = {
'num_repeats': 2,
'recursion_depth': 500,
'recursion_batch_size': 100
}
You will get the two inverse hessian vector, and the output is average of those vector.
The first inverse hessian vector is effected by first randomly selected samples from your training set. Also samples size will be 500 * 100
. The second inverse hessian vector will be effected by another randomly selected samples.
Therefore, initial value is same, but last estimated value will be not same as the first.
Thanks @zironycho for the explanation !! I missed the average part
if inverse_hvp is None:
inverse_hvp = np.array(cur_estimate) / ihvp_config['scale']
else:
inverse_hvp += np.array(cur_estimate) / ihvp_config['scale']
inverse_hvp /= ihvp_config['num_repeats']
Thanks
Hi, I am litte confused over the calculation of cur_estimate in _get_inverse_hvp_lissa()
Since we are assigning test_grad_loss to cur_estimate for every ihvp_config['num_repeats'] The cur_estimate computed would be same in every loop.Why is it assigned test_grad_loss inside the ihvp_config['num_repeats'] loop ?
Thanks.