allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.42k stars 643 forks source link

Fix issue #1249 pytorch-lightning patches #1254

Closed a-gardner1 closed 1 month ago

a-gardner1 commented 2 months ago

Related Issue \ discussion

See #1249.

Patch Description

This patch adds more precise error handling and recovery for attribute errors encountered when accessing module or class attributes whose existence depends upon the version of pytorch-lightning installed.

The patch also deduplicates some code between PatchPyTorchModelIO._patch_lightning_io and PatchPyTorchModelIO._patch_pytorch_lightning_io.

Testing Instructions

Without the patch applied, train a model with pytorch-lightning-2.0.0 or greater and save a checkpoint, which should automatically be uploaded to the ClearML server if the task is configured to do so. Trying to resume training from the checkpoint will not work as the patch for loading models cannot be applied. With the patch applied, training should resume as expected.

Other Information