darkonhub / darkon

Toolkit to Hack Your Deep Learning Models
http://darkon.io
Apache License 2.0
233 stars 34 forks source link

Regarding cur_estimate computation in _get_inverse_hvp_lissa() #47

Closed ayush1997 closed 6 years ago

ayush1997 commented 6 years ago

Hi, I am litte confused over the calculation of cur_estimate in _get_inverse_hvp_lissa()

Since we are assigning test_grad_loss to cur_estimate for every ihvp_config['num_repeats'] The cur_estimate computed would be same in every loop.Why is it assigned test_grad_loss inside the ihvp_config['num_repeats'] loop ?

Thanks.

zironycho commented 6 years ago

Hi, @ayush1997

That function is to get the estimated inverse hessian vector.

test_grad_loss is the initial value to get one of inverse hessian vector. cur_estimate is the updated inverse hessian vector recursively.

If you have 1,000,000 training set, and set the approx param as:

approx_params = {
    'num_repeats': 1,
    'recursion_depth': 500,
    'recursion_batch_size': 100
}

You will get the inverse hessian vector from 500 * 100 training samples.

If you set the approx param as:

approx_params = {
    'num_repeats': 2,
    'recursion_depth': 500,
    'recursion_batch_size': 100
}

You will get the two inverse hessian vector, and the output is average of those vector. The first inverse hessian vector is effected by first randomly selected samples from your training set. Also samples size will be 500 * 100. The second inverse hessian vector will be effected by another randomly selected samples.

Therefore, initial value is same, but last estimated value will be not same as the first.

ayush1997 commented 6 years ago

Thanks @zironycho for the explanation !! I missed the average part

 if inverse_hvp is None:
     inverse_hvp = np.array(cur_estimate) / ihvp_config['scale']
else:
    inverse_hvp += np.array(cur_estimate) / ihvp_config['scale']

inverse_hvp /= ihvp_config['num_repeats']

Thanks