Closed a-gardner1 closed 1 month ago
Thanks for letting us know @a-gardner1. We'll take a look and update on fix availability.
I can open a PR with a proposed fix if you like. I've already implemented one
Contributions are most welcome @a-gardner1 🙂
@ainoam In case you missed it, I did open a PR. Let me know if anything looks off
Thanks for the friendly nudge @a-gardner1. We'll try to address your PR soon.
Describe the bug
While there has clearly been some effort to keep pace with changes to Lightning (see #1033), it has fallen behind since the initial patches were created (https://github.com/allegroai/clearml/commit/64e10b2f62f59244750e73604836b57470a2f0d7) and new versions of Lightning were released. Unfortunately, it silently fails to apply patches to model saving and restoration, which can hide the fact that model logging doesn't fully work as expected. One of the two related (and nearly duplicate) patch methods is shown below (linked here)
Three
AttributeErrors
exist in_patch_pytorch_lightning_io
with newer versions ofpytorch-lightning
:pytorch-lightning-0.10.0
,Trainer.restore
was removed whenCheckpointConnector
was introduced and therestore
method was no longer inherited fromTrainerIOMixin
(https://github.com/Lightning-AI/pytorch-lightning/commit/4724cdf5e0dc938ebff0a6d6b4477eec99326542)pytorch-lightning-2.0.0
,CheckpointConnector
was renamed to_CheckpointConnector
(https://github.com/Lightning-AI/pytorch-lightning/pull/17008)pytorch-lightning-2.1.0
,_CheckpointConnector.save_checkpoint
was removed and inlined intoTrainer
(https://github.com/Lightning-AI/pytorch-lightning/pull/17408#discussion_r1170415577)To reproduce
No reproduction is necessary. There are multiple clear
AttributeError
s that get caught by theException
handler depending on thepytorch-lightning
version.Expected behaviour
The checkpointing mechanism of
pytorch-lightning
should have been patched to enable automatic logging of models with ClearML.Environment