Open meanna opened 1 year ago
OK, fixed using task.mark_started()
It won't fix everything though. I found that with this bug, I can not log tables and other things, also the console output in clearml stops showing progress after the task is closed which is bad.
Hi @meanna !
Originally, this was done so you wouldn't override anything when running training twice from a notebook. In a notebook environment, the task can't know when to properly close, unless it is done manually.
That said, I think it makes little sense to have this in there in hindsight. If a notebook user wants to rerun training, they should manually close the task themselves.
So there are 2 options to fix this:
I think it makes sense to just do option 2, the notebook usecase is suboptimal either way and in this way it won't be in the way of users. What do you think?
Describe the bug
I have the Huggingface Trainer (from the transformers library) in my code, after the training is done I want to upload a model artifact, but it is not possible. I get this error.
Action failed <400/110: tasks.add_or_update_artifacts/v2.10 (Invalid task status: expected=created, status=completed)> (task=208a7835726347c59c1666302f0b9a81, artifacts=[{'key': 'model', 'type': 'string', 'uri': 'http://clearml.gpu.fra.ics.inovex.io:8081/KG_QA/fine-tune%20roberta.208a7835726347c59c1666302f0b9a81/artifacts/model/model.txt', 'content_size': 10, 'hash': 'b0d6dcfed49bb9415ec067e9d8969219c62176d9ce44da5a1fe672634112792d', 'timestamp': 1680620905, 'type_data': {'preview': 'merges.txt', 'content_type': 'text/plain'}}], force=True)
Seems like it is because the task status is completed. Also, it seems like the transformer library is connected to another clearml task, see: https://github.com/huggingface/transformers/blob/main/src/transformers/integrations.py
To reproduce
I tried to add clearml to this code. https://github.com/huggingface/transformers/blob/main/examples/pytorch/question-answering/run_qa.py