E3SM-Project / ACME-ECP

E3SM MMF for DoE ECP project
Other
9 stars 1 forks source link

Fails to restart when history includes crm-level output #91

Closed crjones-amath closed 5 years ago

crjones-amath commented 5 years ago

Recent early science runs failed to restart with the following error: ERROR: set_field_dimensions: mdim size must be > 0

This failure mimics that described in https://github.com/E3SM-Project/E3SM/issues/833. This error has been reproduced on summit at both ne120 and ne4 resolutions when history output includes crm-level output fields with dimensions crm_nx, crm_nx_rad, etc.

Hypothesis and possible solution I believe this is related to the crm-specific coordinates not being found when trying to write to history files on restarts. I think this can be solved by either changing some of the crm pbuf_add_field calls to point to global instead of physpkg (<-- that also fails). Alternatively, the COSP error was solved by moving the add_hist_coord calls to phys_register.

If possible, it would be good if the fix would allow us to restart the current ne120 Early Science run without needing to start fresh.

crjones-amath commented 5 years ago

I'm not adept with the history/restart parts of the code. Pinging @singhbalwinder to request guidance on this.

crjones-amath commented 5 years ago

Update: I've verified that moving add_hist_coord(crm_*,...) to crm_physics_register solves our restart problem for the early science branch. The fix is in crjones/crm/restart_fix (relevant diffs here). I will issue a PR with these changes in a few days.

singhbalwinder commented 5 years ago

It's great that you have found a fix for this issue. Your fix makes sense to me. Do you know why we need to move these lines form cam_diagnostics? One reason could be that this section of the code is never executed for CRM configurations. Other could be that these lines are executed but not in the desired sequence.