ESMCI / cime

Common Infrastructure for Modeling the Earth
http://esmci.github.io/cime
Other
161 stars 206 forks source link

CIME should be able to detect hangs #4553

Open jgfouca opened 9 months ago

jgfouca commented 9 months ago

This would be similar to NODE_FAIL_REGEX and MPI_FAIL_REGEX, something like HANG_TIMEOUT_SEC. Have a thread in case_run watch the modification timestamps of all the log files. If none have been updated in HANG_TIMEOUT_SEC seconds, consider the job hung.

github-actions[bot] commented 6 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.