DifferentiableOptimizer not setting self.param_groups to be the same as reference optimizer

facebookresearch / higher

higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

Apache License 2.0

1.59k stars 123 forks source link

DifferentiableOptimizer not setting self.param_groups to be the same as reference optimizer #83

Open Horse7354 opened 4 years ago

Horse7354 commented 4 years ago

Hi, I am encountering some problems with getting a differential optimizer to update an fmodels parameters. In trying to figure out the issue, I noticed that when I intialize the optimizer: diffopt=higher.optim.DifferentiableSGD(other=inneropt, reference_params=fastparams, fmodel=fmodel) #inneropt is an instance of torch.optim.SGD, diffopt.param_groups has [None,None] for all 'params', and does not have the parameters from inneropt.param_groups. This is not the intended behaviour correct?

egrefen commented 4 years ago

Probably not. Thanks for flagging.

Can you please provide minimal code which produces this output so we can investigate?

If you are having more general problems regarding getting a differential optimizer to update an fmodels parameters, would you mind filing a separate issue for that, with code, output/stack trace, and what you ideally expected to happen?

Horse7354 commented 4 years ago

Got some minimal code:

model = torch.nn.Linear(5,5)

model.fastparams = [model.bias]

model.inneropt = torch.optim.SGD([{'params' : [model.bias], 'lr' : .001}])

fmodel = higher.monkeypatch(model, copy_initial_weights=False)

print('INNEROPT', fmodel.inneropt.param_groups) # params (the bias) show up
print('FASTPARAMS', fmodel.fastparams) # params (the bias) show up

fmodel.diffopt = higher.optim.DifferentiableSGD(fmodel.inneropt, fmodel.fastparams, fmodel=fmodel)

print('DIFFOPT', fmodel.diffopt.param_groups) # params are [None]

output is:

INNEROPT [{'params': [Parameter containing:
tensor([ 0.3529, -0.3059, -0.4128,  0.3909, -0.3499], requires_grad=True)], 'lr': 0.001, 'momentum': 0, 'dampening': 0, 'weight_decay': 0, 'nesterov': False}]
FASTPARAMS [Parameter containing:
tensor([ 0.3529, -0.3059, -0.4128,  0.3909, -0.3499], requires_grad=True)]
DIFFOPT [{'params': [None], 'lr': 0.001, 'momentum': 0, 'dampening': 0, 'weight_decay': 0, 'nesterov': False}]

Horse7354 commented 4 years ago

If it helps, I notice this behaviour also happens in the standard use case:

model = torch.nn.Linear(5,5)

inneropt = torch.optim.SGD([{'params' : [model.bias], 'lr' : .001}])

with higher.innerloop_ctx(model, inneropt) as (fmodel, diffopt):

    print('INNEROPT', inneropt.param_groups) # params (the bias) show up
    print('DIFFOPT', diffopt.param_groups) # params are [None]

outputs:

INNEROPT [{'params': [Parameter containing:
tensor([-0.2229,  0.0286, -0.2304,  0.3907, -0.2869], requires_grad=True)], 'lr': 0.001, 'momentum': 0, 'dampening': 0, 'weight_decay': 0, 'nesterov': False}]
DIFFOPT [{'params': [None], 'lr': 0.001, 'momentum': 0, 'dampening': 0, 'weight_decay': 0, 'nesterov': False}]

Also, I am using version 0.1.5, but I think this might also happen in the current version