E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
351 stars 360 forks source link

Move to CDF5 as the default I/O format #4288

Open jayeshkrishna opened 3 years ago

jayeshkrishna commented 3 years ago

NetCDF Formats discussed here,

More users are encountering issues with the default I/O format (PIO_NETCDF_FORMAT = 64bit_offset) when writing out large variables ( > 4 billion elements). When writing variables with > 4 billion elements users need to explicitly change the PIO_NETCDF_FORMAT to 64bit_data to avoid the size constraints (variable size < 4 billion elements) imposed by the CDF2 format.

However changing the default to CDF5 (PIO_NETCDF_FORMAT = 64bit_data) requires NetCDF library >= 4.6.1 installed on all E3SM machines (for post processing) since there are bugs in the NetCDF library < 4.6.1 that cause issues when reading files in the CDF5 format.

So we need to

rljacob commented 3 years ago

I thought we already decided that CDF5 was the default years ago. But after reviewing that decision, it was just that the input data should NOT be NetCDF4. We did note in the explainer https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/1007223420/NetCDF+explainer that 4.6.1 or later was needed to avoid bugs and that all analysis tools should build with those versions.

rljacob commented 3 years ago

This should really be done in Confluence as a Decision Page.

jayeshkrishna commented 3 years ago

ok, let us decide on this change first in a confluence decision page (it might actually be worthwhile exploring if we can upgrade safely to NetCDF 4.6.x on all the machines though).

xylar commented 3 years ago

At risk of prolonging a discussion that should move elsewhere, up to now the analysis tools that use E3SM-Unified will use the conda-forge version of the NetCDF library, not the system version. This is currently 4.7.4 and will be 4.8.0 in the next E3SM-Unified. There may be other analysis that does not use E3SM-Unified and would instead be using the system NetCDF libraries, though. And for the next E3SM-Unified we do plan to support builds of certain tools (ESMF, ILAMB and TempestExtremes) using system libraries. For this work, it would be very important that system libraries be made more current. The new E3SM-Unified is scheduled for release on July 1st.

PeterCaldwell commented 3 years ago

It would be useful to post the confluence decision page this discussion got moved to in this thread.

jayeshkrishna commented 3 years ago

I just created the confluence page for further discussion on this issue - https://acme-climate.atlassian.net/wiki/spaces/EIDMG/pages/2741633051/Change+the+default+E3SM+output+file+format+to+CDF5 .

We will continue work on this issue based on the decision in the confluence page above.