This would be similar to NODE_FAIL_REGEX and MPI_FAIL_REGEX, something like HANG_TIMEOUT_SEC. Have a thread in case_run watch the modification timestamps of all the log files. If none have been updated in HANG_TIMEOUT_SEC seconds, consider the job hung.
This would be similar to NODE_FAIL_REGEX and MPI_FAIL_REGEX, something like HANG_TIMEOUT_SEC. Have a thread in case_run watch the modification timestamps of all the log files. If none have been updated in HANG_TIMEOUT_SEC seconds, consider the job hung.