Improved framework messaging for POD failures

NOAA-GFDL / MDTF-diagnostics

Analysis framework and collection of process-oriented diagnostics for weather and climate simulations

https://mdtf-diagnostics.readthedocs.io/en/main/

Other

64 stars 100 forks source link

Improved framework messaging for POD failures #666

Open aradhakrishnanGFDL opened 3 months ago

aradhakrishnanGFDL commented 3 months ago

What problem will this feature solve? Improved messaging so new users can help with triaging. Describe the solution you'd like A suggestion for the framework messaging through the ff POD exercise is to explicitly print helpful messages at the bottom of the errors, for users to refer to the log files with the path -- and hint for possible issues within the POD or data preprocessing depending on the case. Some version of this or a pointer to the sphinx docs that has these elaborated will be helpful..something to consider.

wrongkindofdoctor commented 3 months ago

@aradhakrishnanGFDL It is probably beneficial to advise POD developers to use detailed print statements, as the framework cannot capture logging info directly from the subprocessruntimemanager without more sophisticated logging features that may a large ask for POD developers to implement; the framework can only tell if the POD subprocess fails or succeeds, and subprocessruntimemanager messaging reflects this. Any POD logging occurs via a logger attached to the pod object, and is limited to whatever the POD prints to the terminal and the final POD stack trace.

aradhakrishnanGFDL commented 3 months ago

Yes, adding things to POD dev best practices would be nice. The POD in question did have a print statement. My proposal is not to give a detailed error message, but to provide users with some pointers after the following deactivation, assuming MDTF takes control after the subprocess call (e.g crawls through the output). Just a simple "Please refer to this documentation on how to look for the logfiles to troubleshoot the issue". Does that make sense?

ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_maps_IRF.png'. ERROR: Deactivated <#1BUm:forcing_feedback> due to MDTFFileNotFoundError("[Errno 2] No such file or directory: 'Missing 11 files.'").