Task is already marked stopped when the callback from Task.register_abort_callback is called

Describe the bug

It is not possible to modify the task (e.g. update and upload a checkpoint) in a callback registered with Task.register_abort_callback

Trying to save a checkpoint in the callback gives the following error: 2024-09-13 12:12:27,581 - clearml.model - WARNING - Could not update last created model in Task b281b21329e3470ebc8959e831f28ff8, Task status 'stopped' cannot be updated

To reproduce

def on_abort_callback() -> None:
    print("Saving last checkpoint")
    trainer.save_checkpoint(
        self.last_filepath,
        weights_only=self.save_weights_only,
    )

    # Ensure that the trainer stops gracefully
    trainer.should_stop = True

print("Registering model checkpoint abort callback")
Task.current_task().register_abort_callback(on_abort_callback)

where trainer is a pytorch-lightning Trainer and the callback is registered in an extended lightning ModelCheckpoint (docs)

Expected behaviour

It should be possible to upload a model checkpoint to the ClearML server when a task is aborted in the abort callback function.

Current workaround is to mark the current task in_progress while saving checkpoint and then afterwards marking it stopped again. Not intuitive :-)

Environment

Server type: self hosted
ClearML SDK Version: 1.16.4
ClearML Server Version: 1.15.0
Python Version: 3.10.13
OS (Windows \ Linux \ Macos): linux
Related Discussion

https://clearml.slack.com/archives/CTK20V944/p1726571061754989

allegroai / clearml