facebookresearch / schedule_free

Schedule-Free Optimization in PyTorch
Apache License 2.0
1.83k stars 63 forks source link

Does saved parameters need adjusted? #15

Closed FykAikawa closed 5 months ago

FykAikawa commented 5 months ago

Thank you for sharing very good idea! With these nouveau schedule-free optimizers, I'm trying "pre-training" a CNN model. My plan is to, first pre-train ResNet with your schedule-free AdamW, and then fine-tune with (PyTorch's) default SGD optimizer. Due to different training frameworks(and my limited skills), I need to save pretrained model after pre-training, load the parameters, and then use different optimizer for fine-tuning. Regarding to the usage, I have a question, In eval step, your schedule-free optimizer appears to "adjust" weight by interpolating x(current param) to z(state parameter). Does this mean that desirable parameters at evaluation step are obtained by interpolating each parameter from x to z? If this is true, saved parameter, by calling torch.save(model.state_dict), should also be the interpolated parameters. Is this interpolation process included in any function, when being called when saving parameters, provided by your optimizer? Thank you,

adefazio commented 5 months ago

The parameters you want to save for later finetuning are the "x" sequence in the theory notation in the readme. This can be done by calling optimizer.eval() right before saving the model. Calling eval switches the model's weights from the training sequence y to the eval sequence x.

FykAikawa commented 5 months ago

Thank you! Sorry for asking again, but I have another question. In "Caveats" section of readme, it says this optimizer needs to update statistics of BatchNorm layer by forwarding data before eval. When my model has BN, is the updating stat also needed before saving parameters?

adefazio commented 5 months ago

Yes, you should do the BN update before saving model parameters out also.

FykAikawa commented 5 months ago

I now understand the mechanism. Thank you!