NOAA-GFDL / FMS

GFDL's Flexible Modeling System
Other
92 stars 135 forks source link

Supergrid input buffer size does not match read_data buffer in MOM OM4_025 test case #346

Open wrongkindofdoctor opened 4 years ago

wrongkindofdoctor commented 4 years ago

Describe the bug The first call to the new read_data routines via the MOM wrappers from set_grid_metrics_from_mosaic /MOM_grid_initialize.F90#L273) when using the MOM6-examples/ICE_OCEAN_SIS2/OM4_025 test with the new IO generates the following error: FATAL from PE 256: NetCDF: Start+count exceeds dimension bound This occurs because MOM is passing an array 'TMPZ' that is the size of the data domain. To Reproduce Compile MOM6+new IO, SIS2+new IO, FMS 2020.01, FMScoupler 2020.01 with intel 18.0.6 in debug on gaea, and run the OM4_025 test experiment with 480 pes. The model should crash when trying to read in 'x' from INPUT/ocean_hgrid.nc

Expected behavior The data should be read into the buffer via the 2D domain-decomposed read_data interfaces. This is not an issue with small pe runs in the ocean_only cases and simple SIS2 cases.

System Environment Describe the system environment, include:

This is another gitlab-CI case, so contact me offline for run directories, executables, etc... @marshallward @menzel-gfdl

wrongkindofdoctor commented 4 years ago

To add a bit of context, set_grid_metrics_from_mosaic has 4 temporary arrays (tmpU, tmpV, tmpZ tmpT) that are allocated as 2*(size of data domain) +/- some offset depending on the type of grid (C, B, etc...) they are mapping to. To avoid performance issues, this routine reads 'x', from ocean_hgrid.nc into the tmpZ , and maps the values to to G%geoLonT, G%geoLonBu, G%geolonCV, G%geolonCu instead of doing 4 separate read_data calls with different sized buffers. Similar procedures exist for 'y', 'dx', 'dy'.

In lieu of futzing with the MOM_io wrappers and adding more conditionals for this case (and others that may arise in CM and ESM configurations), one possible solution could simply be to read in the global data array using the non-domain-decomposed interfaces, and map them to the data domain indices in the temporary array, but I would need assistance from @marshallward or @nikizadehgfdl to do this correctly since I am not the expert on MOM's indexing conventions.

I'll add that the prevailing sentiment is that fms-io should "just know" what users want based on the domain description and array size. This is possible provided users pass in standard sized data buffers and let the mpp routines handle the indexing, pe lists, etc... However, data arrays that have user-specified mpp procedures "pre-applied" so-to-speak will cause the interfaces to choke as is the case here. Moving forward, developers should aim to read/write data into a simple arrays (e.g., see how tmpGlbl is used here) , THEN slice-and-dice as needed in their own routines if necessary. looping @menzel-gfdl into the discussion, as well.

menzel-gfdl commented 4 years ago

The I/O is only set up to do domain-decomposed reads into buffers that are either the size of the compute domain, or the size of the compute domain + size of the halos. I don't think 4 reads vs. 1 read is really anything to worry about performance-wise, unless these reads are happening over and over again.

marshallward commented 4 years ago

I'm sorry that I haven't answered, as this isn't something that I have looked at before or know much about. But I am guessing the issue is because you are reading x on a supergrid into tmpZ, and the new IO is tripping up because the grid is an unexpected shape, is that right?

Is it possible to just read in the local segment as ungridded data and then fill in the tmp* variables?

wrongkindofdoctor commented 4 years ago

@marshallward Correct. I can read in the segment instead of the whole array since the new IO handles start and count arguments, but I'll probably need help getting the indexing correct. Stay tuned.