Open johnml1135 opened 1 year ago
Hi @johnml1135 ! I think that the logger is not flushed properly on program exit.
Can you try doing self.clearml_task.get_logger().flush(wait=True)
at the end of the script until we fix this?
We ended up finding another way around it - thank you for figuring out what was going on - we may need it again in the future.
That fix does not appear to work.
The first main issue (that the job closes at training end) is here in Hugging face Transformers: https://github.com/huggingface/transformers/blob/0ebee8b93358b6ef0182398b8fcbd7afd64c0f97/src/transformers/integrations/integration_utils.py#L1488-L1493
I made a pull request to resolve the issue in Hugging face: https://github.com/huggingface/transformers/pull/26763 For a temp fix you can use the solution from here: https://github.com/sillsdev/serval/issues/44#issuecomment-1758761393
The other aspect of this issue is that when reopening a task, you can actually write scalars, etc. to it.
Describe the bug
The prevalue is registered, but not the postvalue