allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.61k stars 651 forks source link

Add timeout on program exit #1085

Open artemisart opened 1 year ago

artemisart commented 1 year ago

I have some issues currently where some scripts never finish as clearml is stuck waiting for events, I'm not sure why it happens but there should be a timeout anyway.

Related Issue \ discussion

If this patch originated from a github issue or slack conversation, please paste a link to the original discussion

Patch Description

Description of what does the patch do. If the patch solves a bug, explain what the bug is and how to reproduce it (If not already mentioned in the original conversation)

Testing Instructions

Instructions of how to test the patch or reproduce the issue the patch is aiming to solve

Other Information

jkhenning commented 1 year ago

Hi @artemisart , this seems like a bug - do you have any idea why it is pending for so long? is it still uploading stuff?

artemisart commented 1 year ago

No I don't know sorry, do you have ideas on how to investigate this?

jkhenning commented 1 year ago

What did you call in your code?

artemisart commented 1 year ago

Will try to reproduce it, but it may have been issues with reporting (matplotlib) figures created with seaborn https://seaborn.pydata.org/.

jkhenning commented 1 year ago

@artemisart is this still relevant?