NVIDIA / NeMo-Run

A tool to configure, launch and manage your machine learning experiments.
Apache License 2.0
78 stars 20 forks source link

Bypass errors during run.Experiment __del__ #10

Closed hemildesai closed 3 months ago

hemildesai commented 3 months ago

Thanks! Just wondering why is that logic needed and also what are the cases when it might fail (and we should still ignore that)?

The __del__ method just does some cleanup like removing existing ssh tunnels etc whenever the Experiment object is garbage collected. But some objects are garbage collected before the Experiment which leads to the exceptions. So I think it's pretty safe to have the try...except in del but not in cleanup (as cleanup is called in other places too).