NCAR / wrfcloud

WRF Cloud Framework
Apache License 2.0
15 stars 6 forks source link

WRF jobs don't always kill when seq fault is encountered #218

Open fossell opened 1 year ago

fossell commented 1 year ago

When wrf.exe encounters a seg fault, the job doesn't necessarily fail, it just hangs. since there was no wall clock time set, the job will sit running indefinitely. This occurred with a cfl/segfault error when testing the Pacific NW domain as part of the new config gui testing. Need more robust error/status checking or monitoring to catch these instances.

Temporary fail safe solution: Added a hard coded wall clock time of 12 hours so job won't run indefinitely.

Expected Behavior

If the wrf.exe encounters seg fault, need to

Environment

Describe your runtime environment: 1. Machine: (e.g. HPC name, Linux Workstation, Mac Laptop) 2. OS: (e.g. RedHat Linux, MacOS) 3. Software version number(s)

To Reproduce

Describe the steps to reproduce the behavior: 1. Go to '...' 2. Click on '....' 3. Scroll down to '....' 4. See error

Relevant Deadlines

List relevant project deadlines here or state NONE.

Define the Metadata

Assignee

Labels

Projects and Milestone

Bugfix Checklist