This is an issue mostly encountered in training, but is a measure which I think would regardless be helpful for jobs that fail without an error message. When training runs fail and don't produce an error message (input, runtime, resource, etc.), it would be nice if the log file (if available) was still produced. This is particularly helpful because failed and completed jobs don't leave records- if the log file is produced somehow there may be better ways to problem solve for either adjusting the workflow or addressing something else happening behind the scenes.
This is an issue mostly encountered in training, but is a measure which I think would regardless be helpful for jobs that fail without an error message. When training runs fail and don't produce an error message (input, runtime, resource, etc.), it would be nice if the log file (if available) was still produced. This is particularly helpful because failed and completed jobs don't leave records- if the log file is produced somehow there may be better ways to problem solve for either adjusting the workflow or addressing something else happening behind the scenes.