Closed OscarDPan closed 4 weeks ago
Hi @OscarDPan!
Thank you for your pull request and welcome to our community.
In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.
In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.
Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed
. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.
If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!
Hi @OscarDPan ,
Thanks for your report, and in our experiment on vanilla PyTorch, this is not an issue but we never experimented that with lightning. Personally I am fine with using non-lambda function but I need to experiment something before confirming this change is benign. Before that, it seems you could use staticmethod
to unblock yourself for now, and I will revisit this one later after my experiments.
Hi @OscarDPan ,
I have tested this, and it seems quite a few tests related DDPBlockInfo
will break because the inability of reassign allocate_zeros_tensor
and get_tensor
. If you run your code in DDP setting, this error might manifest. So far, our design has to follow this design now.
Recently I am going through some re-designs of BlockInfo
class and hopefully the next design will get rid of this eventually, but until then, if what the solution you have now is working for you, please keep use that.
@tsunghsienlee got it. Thank you for the updates!
I was using pytorch lightning package to do model training(with dist shampoo) and evaluation For some reason before the training the model object was picklable, but it wasn't after training. During trainer.eval() it complained that it's not able to pickle the lambda functions in shampoo_block_info.py.
Could we change the lambdas to named function? If not, could any expert shed light on why my model wasn't picklable after training?