ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
Huggingface Trainer class is integrated with clearml. When trainer.train() finishes (successfully), the trainer calls task.close(), making original clearml task unavailable. I am refering to this line specifically (permalink).
To reproduce
task = Task.init(
project_name='project',
task_name='task',
)
...
model = ...
dataset = ...
...
from transformers import Trainer
trainer_args = ...
trainer = SFTTrainer(
model,
train_dataset=dataset,
args=trainer_args,
)
print(task.status) # Running
trainer.train()
print(task.status) # Completed
# now the task object is dead for the most purposes
Expected behaviour
The main task should not be closed (making it unavailable) after the training is finished. This is especially important if there are multiple trainer runs or any custom actions are taken after training.
Describe the bug
Huggingface Trainer class is integrated with clearml. When trainer.train() finishes (successfully), the trainer calls task.close(), making original clearml task unavailable. I am refering to this line specifically (permalink).
To reproduce
Expected behaviour
The main task should not be closed (making it unavailable) after the training is finished. This is especially important if there are multiple trainer runs or any custom actions are taken after training.
Environment
Independent