SLS and parameter groups for larger datasets?

lessw2020 commented 4 years ago

I'm hitting an issue though in using/testing as the code seems to assume no parameter groups?
(from utils.py) def get_grad_list(params): return [p.grad for p in params]

this fails b/c p.grad is inside each param group ala

for group in self.param_groups: for p in group["params"]: if p.grad is None: <-- now you can access p.grad

Is there a way to adjust to handle parameter groups? I'm trying to integrate into FastAI which by default will create two param groups. I'll see if I can avoid that but I think param groups are quite common in most platforms so any tips here would be appreciated.

lessw2020 commented 4 years ago

so FastAI creates 2 param groups to split out l1 and l2 params....I've made a temp function to avoid that:

`def filter_all_params_no_split(layer_groups:Collection[nn.Module])->List[List[nn.Parameter]]: pure = [] buffer=[] for l in layer_groups: for c in l.children(): buffer +=list(trainable_params(c))
pure += [uniqueify(buffer)]

return pure ` though now hitting other issues inside SLS but I still think it's vital that SLS be able to handle param groups as that's the default for most optimizer code...

IssamLaradji commented 4 years ago

You are right that we should include param groups to be consistent with other optimizers. We will add that by the end of this week. Thanks for pointing this out!

lessw2020 commented 4 years ago

Hi @IssamLaradji That's great to hear! I'm hoping to get it setup so your SLS is fully able to be integrated with FastAI2 and thus be readily available as an optimizer choice and help promote SLS. There is some tuning to be done as FastAI2 by default does not expose a closure, wants to call loss.backward, etc but hoping I can get that setup and integrated. I'd also love to use SLS on two projects I'm consulting on so I having the multi-param group handling will definitely move that forward. Thanks again and if you need any testing on the param implementation let me know as I have a modified FastAI version largely setup to work with SLS already so can run on ImageWoof/Nette etc for fast testing.

IssamLaradji commented 4 years ago

Thanks a lot.

I added param_groups, let me know how that works for you! thanks :)

lessw2020 commented 4 years ago

Excellent - testing it now!

lessw2020 commented 4 years ago

It's handling the param groups in the respect it doesnt' blow up like before. However, it's not actually learning anything (loss ends up same as random..i.e. 10 classes = accuracy 10%).
I'm debugging some now....try_sgd_step is being called so it's passing back a step size, etc. but it doesn't seem the step size is ultimately changing...so not clear weights are actually being updated basically.

lessw2020 commented 4 years ago

sls_not_learning

Layer Groups Len 1 Len Split_params = 2 Opt results 1 Sls ( Parameter Group 0 beta_b: 0.9 beta_f: 2.0 bound_step_size: True c: 0.1 eta_max: 10 gamma: 2.0 init_step_size: 1 line_search_fn: armijo lr: 0 n_batches_per_epoch: 388 reset_option: 1

Parameter Group 1 beta_b: 0.9 beta_f: 2.0 bound_step_size: True c: 0.1 eta_max: 10 gamma: 2.0 init_step_size: 1 line_search_fn: armijo lr: 0 n_batches_per_epoch: 388 reset_option: 1 ) Opt results 2 OptimWrapper over Sls ( Parameter Group 0 beta_b: 0.9 beta_f: 2.0 bound_step_size: True c: 0.1 eta_max: 10 gamma: 2.0 init_step_size: 1 line_search_fn: armijo lr: 0 n_batches_per_epoch: 388 reset_option: 1

Parameter Group 1 beta_b: 0.9 beta_f: 2.0 bound_step_size: True c: 0.1 eta_max: 10 gamma: 2.0 init_step_size: 1 line_search_fn: armijo lr: 0 n_batches_per_epoch: 388 reset_option: 1 ). True weight decay: False

lessw2020 commented 4 years ago

I'll pickup on it again tomorrow and try to isolate it more. I can't tell exactly where it's not working at this point, but it's at least running now in FastAI with param groups vs couldn't get it runnning earlier :)

IssamLaradji commented 4 years ago

oh thanks for testing, could you pass me the script you used to reproduce the figure you generated with learn.fit? the issue is probably i am not registering step_sizefor every param_group.

lessw2020 commented 4 years ago

Hi @IssamLaradji - here's a relevant snippet but not sure how much that will help you. I had to make changes to three different FastAI files to get SLS to run as FastAI doesn't expect to have a closure, not call loss.backwards(), etc. optar = partial(Sls,c= 0.1, n_batches_per_epoch = n_epochs) #,acceleration_method="polyak") model = mxresnet50(c_out=10, sa=1) learn = Learner(data, model, metrics=[accuracy],wd=None, #MixNet(input_size=256) #mxresnet50(c_out=10, sa=1), opt_func=optar, bn_wd=False, true_wd=False, loss_func = LabelSmoothingCrossEntropy()) learn.fit(2,4e-3)

If you have teamviewer maybe we can do a quick call on Thursday and I can walk you through the whole thing? (I'm in Seattle, WA PST). Otherwise, I can debug more on Thursday and try to pin it down further. I may also simplify and set up in FastAI 1.9 which is nearly the FastAI 2.0 structure and run in a basic resnet to reduce the moving parts involved.

IssamLaradji commented 4 years ago

Hi @lessw2020, sorry I am out of town and will be back later this week. We can use teamviewer coming Monday if you like! On another note, does FastAI implement lbfgs? because lbfgs requires a closure to perform the line-search just like SLS.

lessw2020 commented 4 years ago

Hi @IssamLaradji - Monday works great. FastAI does not have lbfgs...I've had some discussions with Jeremy about how FastAI v2 can support optimizers like SLS, AliG, etc. that require passing in a loss or closure and hoping to use SLS to make the changes in the framework. I'll try and send you a PM on Facebook with my direct contact info.

IssamLaradji commented 4 years ago

Thanks @lessw2020 , let's correspond there on Facebook :)

IssamLaradji / sls

SLS and parameter groups for larger datasets? #3