E3SM-Project / scorpio

A high-level Parallel I/O Library for structured grid applications
Other
21 stars 16 forks source link

Reserve some extra space in the header only once when creating NetCDF files #487

Closed dqwu closed 1 year ago

dqwu commented 1 year ago

Avoid reserving extra space (padding) in the output NetCDF file header more than once.

After the header section is expanded (even with sufficient free space), a new reservation (with the same request as before) may involve moving (shifting) data, which can be very expensive if the data sections are huge (this is very common for output files of some ultra-high-resolution E3SM/scream cases).

Follows up PR #448

dqwu commented 1 year ago

This PR has been tested with a scream ne1024 F case on Summit. The hanging issue when writing a huge restart file (about 2 TB) seems to be fixed.

mt5555 commented 1 year ago

If E3SM triggers this ( needs to move data in order to expand the header), maybe we should abort with a message about increasing header space via option suggested above - rather than continue to run slowly?

jayeshkrishna commented 1 year ago

This is an issue that needs to be fixed in SCORPIO since the increase in header space should not be reserved/extended multiple times (and @dqwu already has a fix for handling this). The issue with potentially not having enough header space can be fixed by moving this (the amount of header space reserved) to a configurable option.

dqwu commented 1 year ago

@dqwu : Can you also add a configure option to control the bytes of header space reserved (One potential issue is that the header space reserved here might not be enough to accommodate the metadata added by the application during future redefs and metadata changes during post processing)?

OK, I have added this configure option.