facebookresearch / optimizers

For optimization algorithm research and development.
Other
442 stars 31 forks source link

replace lambda function #25

Closed OscarDPan closed 4 weeks ago

OscarDPan commented 1 month ago

I was using pytorch lightning package to do model training(with dist shampoo) and evaluation For some reason before the training the model object was picklable, but it wasn't after training. During trainer.eval() it complained that it's not able to pickle the lambda functions in shampoo_block_info.py.

Could we change the lambdas to named function? If not, could any expert shed light on why my model wasn't picklable after training?

facebook-github-bot commented 1 month ago

Hi @OscarDPan!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

facebook-github-bot commented 1 month ago

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

tsunghsienlee commented 1 month ago

Hi @OscarDPan ,

Thanks for your report, and in our experiment on vanilla PyTorch, this is not an issue but we never experimented that with lightning. Personally I am fine with using non-lambda function but I need to experiment something before confirming this change is benign. Before that, it seems you could use staticmethod to unblock yourself for now, and I will revisit this one later after my experiments.

tsunghsienlee commented 1 month ago

Hi @OscarDPan ,

I have tested this, and it seems quite a few tests related DDPBlockInfo will break because the inability of reassign allocate_zeros_tensor and get_tensor. If you run your code in DDP setting, this error might manifest. So far, our design has to follow this design now.

Recently I am going through some re-designs of BlockInfo class and hopefully the next design will get rid of this eventually, but until then, if what the solution you have now is working for you, please keep use that.

OscarDPan commented 4 weeks ago

@tsunghsienlee got it. Thank you for the updates!