Closed dqwu closed 1 year ago
This PR has been tested with a scream ne1024 F case on Summit. The hanging issue when writing a huge restart file (about 2 TB) seems to be fixed.
If E3SM triggers this ( needs to move data in order to expand the header), maybe we should abort with a message about increasing header space via option suggested above - rather than continue to run slowly?
This is an issue that needs to be fixed in SCORPIO since the increase in header space should not be reserved/extended multiple times (and @dqwu already has a fix for handling this). The issue with potentially not having enough header space can be fixed by moving this (the amount of header space reserved) to a configurable option.
@dqwu : Can you also add a configure option to control the bytes of header space reserved (One potential issue is that the header space reserved here might not be enough to accommodate the metadata added by the application during future redefs and metadata changes during post processing)?
OK, I have added this configure option.
Avoid reserving extra space (padding) in the output NetCDF file header more than once.
After the header section is expanded (even with sufficient free space), a new reservation (with the same request as before) may involve moving (shifting) data, which can be very expensive if the data sections are huge (this is very common for output files of some ultra-high-resolution E3SM/scream cases).
Follows up PR #448