diag_manager: Regional subsetting for u-point variables does not seem to be using correct grid

ashao commented 5 years ago

The diag_manager may not be using the correct domain when trying to find the array indices for the regional subsetting. This problem can be demonstrated using the Pacific_undercurrent umo diagnostic in the OM4_025 test case. Essentially what seems to be happening is that the vector of longitudes at the corner (q) points (which are ostensibly placeholders and not representative of the tripolar grid) are incorrectly being used to find the point closest to the requested longitudinal point. Instead the diagnostic manager should use the full 2D lat/lon arrays to try to find the correct indices for the requested subregion.

A notebook demonstrating the case where this was found in more detail can be found https://gist.github.com/ashao/77a8cdc20eb18dd4adb7f388e816bd6a. To use for yourself, any ocean_static.nc file from an OM4_025 run can be used, other model outputs can be found in a run that @nikizadehgfdl did:

/lustre/f2/scratch//oar.gfdl.ogrp-account/work/nnz/xanadu_mom6_devgfdl_20190412/OM4p25_IAF_BLING_csf_rerun_ShaoDiags_1x1m0d_1756x1o.o268462119/output.stager/lustre/f2/scratch/oar.gfdl.ogrp-account/nnz/xanadu_mom6_devgfdl_20190412/OM4p25_IAF_BLING_csf_rerun_ShaoDiags/ncrc4.intel16-prod/archive/1x1m0d_1756x1o/history/17080101.nc/

nikizadehgfdl commented 5 years ago

@underwoo is this the bug being worked at by the MS?

underwoo commented 5 years ago

@nikizadehgfdl we are looking at this issue now to see if/how we can resolve this. If you are asking if this is related to the non-combined regional output files not containing the same set of variables/dimensions, I do not have an answer to that ATT. We may be able to determine that as we investigate this issue further.

ashao commented 5 years ago

@underwoo I was thinking about this problem more and I'm not entirely sure that there's a fully generalizable solution short of doing some kind of interpolation. Perhaps the 'correct' way forward is that instead of specifying the lat/lon of the requested region, the global i,j indices of the requested axes are indicated. This makes it more annoying to add a regionally diagnostic because the user has to look up and set those manually, but it allows more direct control of the output.

StephenGriffies commented 5 years ago

FYI @ashao : when I built the diag table for OM4_p25 and OM4_p5, I tweaked the lat/lon values according to what I found when running the model and where the land/sea masks were located. I am fairly confident that one needs to do a similar iterative approach for each topography/grid configuration whereby one looks at land/sea masks carefully and tweaks the lat/lon values in the diag table accordingly. It is too risky to rely solely on diag table without checking the model run, given that topography is so dependent on the grid spacing.

So at least for OM4_p25 and OM4_p5, I hope that the lat/lon settings in the diag table are properly capturing a land-to-land crossing that provides a proper estimate of the transports.

ashao commented 5 years ago

@StephenGriffies Specifying the i,j indices directly avoids needing to iterative process that you had to do. For the Equatorial Undercurrent, instead of

 "ocean_model_z", "uo",   "uo", "ocean_Pacific_undercurrent",  "all", "mean", "-155. -155. -2. 2. -1 -1",2

you would specify the following

"ocean_model_z", "uo",   "uo", "ocean_Pacific_undercurrent",  "all", "mean", "580 580 496 512 -1 -1",2

The latter means you wouldn't have guess and check what the diagnostic manager is doing and make it unambiguous that the diag table is requesting a rectangular subset of the domain.

Example code to find regional indices

print(np.nonzero( (geolon_u == -155) & (geolat_u >=-2) & (geolat_u <= 2)) )
(array([495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507,
        508, 509, 510, 511]),
 array([579, 579, 579, 579, 579, 579, 579, 579, 579, 579, 579, 579, 579,
        579, 579, 579, 579]))
# Note need to add 1 because Python uses 0-indexing and Fortran uses 1-indexing

StephenGriffies commented 5 years ago

Yes, this is a good point. Very useful Python code @ashao . Many thanks!

ashao commented 5 years ago

This issue was reraised today at the MOM6 dev meeting in response to @milicak's issue: https://github.com/NOAA-GFDL/MOM6-examples/issues/285.

Generally, it seemed to be preferable to the group to be able to specify the global indices. Has there been any internal discussion about a path/timeline forward to implement a solution within FMS?

wrongkindofdoctor commented 5 years ago

Ashao,

Thank you very much for raising the issue and for providing a potential solution to the issue. FMS has experienced some internal personnel shuffling since your initial post, and I am now the manager for the repository, as well as in charge of implementing new I/O interfaces in MOM6. The outcome of the rewrite will be that MOM6 reads and writes netCDF files via direct calls to the new IO in the FMS github repository rather than using the current practice of using a combination of old FMS, mpp, and netCDF calls .

While I have not examined the problem you are having in depth yet, I have had a potentially related issue replacing some of the read_data calls in the super grid initialization scheme because of the way the user-specified domains are indexed, and have been working with the MOM6 developers and @menzel-gfdl to solve the problem.

We have replaced the IO routines in diag_manager in the fms-io-dev branch, and there is a chance that the diag-manager bug you encountered has been fixed. You may want to try doing a git checkout fms-io-dev in your FMS source directory (assuming you are building with the FMS github repo), and recompiling your executable in debug mode.

Below is a snippet from one of the fre-generated build scripts that I'm using to compile a single node (36-pe) ocean-only configuration for testing on Gaea. The script uses the GFDLlist_paths and mkmf tools provided in the mkmf repo to search the $src_dir for the files to add to the component (FMS, MOM6, and ocean_BGC in this case) makefiles, and build the makefiles with gnu 7. Note the presence of -IFMS/fms2_io/include in the mkmf argument list for the FMS repo. If you are using a similar script to build your makefiles, this should be the only line for the FMS repo you have to change after you have switched to the fms-io-dev branch in your $src_dir/FMS :

#!/bin/tcsh -fx
# ---------------- Set build and src directories
set src_dir = /lustre/f2/dev/$USER/MOM6_solo/mom6_solo_compile/src
set bld_dir = /lustre/f2/dev/$USERk/MOM6_solo/mom6_solo_compile/ncrc3.gnu7-debug/exec
# ---------------- Make template
set mkmf_template = /ncrc/home2/fms/local/opt/fre-commands/bronx-15/site/ncrc3/gnu.mk

# ---------------- write main Makefile
sed -e 's/<TAB>/\t/' >$bld_dir/Makefile <<END
# Makefile for Experiment 'mom6_solo_compile'

SRCROOT = $src_dir/
BUILDROOT = $bld_dir/

MK_TEMPLATE = $mkmf_template
include \$(MK_TEMPLATE)

fms_mom6_solo_compile.x: mom6/libmom6.a ocean_BGC/libocean_BGC.a fms/libfms.a
<TAB>\$(LD) \$^ \$(LDFLAGS) -o \$@ \$(STATIC_LIBS)

fms/libfms.a:  FORCE
<TAB>\$(MAKE) SRCROOT=\$(SRCROOT) BUILDROOT=\$(BUILDROOT) MK_TEMPLATE=\$(MK_TEMPLATE)  --directory=fms \$(@F) 

ocean_BGC/libocean_BGC.a: fms/libfms.a FORCE
<TAB>\$(MAKE) SRCROOT=\$(SRCROOT) BUILDROOT=\$(BUILDROOT) MK_TEMPLATE=\$(MK_TEMPLATE)  --directory=ocean_BGC \$(@F) 

mom6/libmom6.a: fms/libfms.a FORCE
<TAB>\$(MAKE) SRCROOT=\$(SRCROOT) BUILDROOT=\$(BUILDROOT) MK_TEMPLATE=\$(MK_TEMPLATE) OPENMP="" --directory=mom6 \$(@F) 

FORCE:

clean:
<TAB>\$(MAKE) --directory=fms clean
<TAB>\$(MAKE) --directory=ocean_BGC clean
<TAB>\$(MAKE) --directory=mom6 clean

localize:
<TAB>\$(MAKE) -f \$(BUILDROOT)fms/Makefile localize
<TAB>\$(MAKE) -f \$(BUILDROOT)ocean_BGC/Makefile localize
<TAB>\$(MAKE) -f \$(BUILDROOT)mom6/Makefile localize

distclean:
<TAB>\$(RM) -r fms
<TAB>\$(RM) -r ocean_BGC
<TAB>\$(RM) -r mom6
<TAB>\$(RM) fms_mom6_solo_compile.x
<TAB>\$(RM) Makefile

END
# ---------------- create component Makefiles
mkdir -p $bld_dir/FMS
list_paths -o $bld_dir/FMS/pathnames_fms $src_dir/FMS $src_dir/FMS/include $src_dir/FMS/mpp/include 
cd $bld_dir
pushd FMS
mkmf -m Makefile -a $src_dir -b $bld_dir -p libfms.a -t $mkmf_template -g -c "-Duse_libMPI -Duse_netCDF -Duse_netCDF3 -DINTERNAL_FILE_NML -g -DMAXFIELDMETHODS_=400" -IFMS/fms2_io/include -IFMS/include -IFMS/mpp/include -Imom6/src/MOM6/pkg/CVMix-src/include $bld_dir/fms/pathnames_fms
popd

mkdir -p $bld_dir/ocean_BGC
list_paths -o $bld_dir/ocean_BGC/pathnames_ocean_BGC $src_dir/ocean_BGC/generic_tracers $src_dir/ocean_BGC/mocsy/src 
cd $bld_dir
pushd ocean_BGC
mkmf -m Makefile -a $src_dir -b $bld_dir -p libocean_BGC.a -t $mkmf_template -g -c "-DINTERNAL_FILE_NML -g" -o "-I$bld_dir/fms" -IFMS/fms2_io/include -IFMS/include -IFMS/mpp/include -Imom6/src/MOM6/pkg/CVMix-src/include $bld_dir/ocean_BGC/pathnames_ocean_BGC
popd

mkdir -p $bld_dir/mom6
list_paths -o $bld_dir/mom6/pathnames_mom6 $src_dir/mom6/src/MOM6/config_src/dynamic $src_dir/mom6/src/MOM6/config_src/solo_driver $src_dir/mom6/src/MOM6/src/*/ $src_dir/mom6/src/MOM6/src/*/*/ $src_dir/mom6/src/MOM6/pkg/CVMix-src/include $src_dir/ocean_BGC/generic_tracers $src_dir/ocean_BGC/mocsy/src 
 setenv MAIN_PROGRAM mom6/MOM_driver.o 
cd $bld_dir
pushd mom6
mkmf -m Makefile -a $src_dir -b $bld_dir -p libmom6.a -t $mkmf_template -g -c "-DINTERNAL_FILE_NML -g -DMAX_FIELDS_=100 -DNOT_SET_AFFINITY -D_USE_MOM6_DIAG -D_USE_GENERIC_TRACER -DUSE_PRECISION=2" -o "-I$bld_dir/fms" -IFMS/fms2_io/include -IFMS/include -IFMS/mpp/include -Imom6/src/MOM6/pkg/CVMix-src/include $bld_dir/mom6/pathnames_mom6
popd
# ---------------- call make on the main Makefile
make  DEBUG=on NETCDF=3 fms_mom6_solo_compile.x

ashao commented 5 years ago

Thank you @wrongkindofdoctor for your response. Also, my gratitude for taking on the task of overseeing the these improvements to the I/O routines!

From my reading of the diag_manager code on the fms-io-dev branch, I think that the original problem would persist. The fundamental problem is that the get_subfield_size in diag_util.F90 assumes that anything not on a cubed sphere grid, is a regular Cartesian grid. https://github.com/NOAA-GFDL/FMS/blob/fd9f18229b3181e1237b532192d533a35d367251/diag_manager/diag_util.F90#L243-L321.

Currently, the diagnostic manager only knows about the placeholder lon, lat 1d vectors for any non-cubed sphere grid. For any other irregular grid (like the tripolar one used in the OM4 and SPEAR configurations), this will necessarily yield incorrect results in https://github.com/NOAA-GFDL/FMS/blob/fd9f18229b3181e1237b532192d533a35d367251/diag_manager/diag_util.F90#L637-L712

As mentioned previously and confirmed yesterday in our discussions with @StephenGriffies, @Hallberg-NOAA, @adcroft, @gmarques, and others: the most accurate way forward would be for the user to specify the global indices directly. This allows us to ensure that something like a strait transport diagnostic is a fully 'connected' line instead of a potential error in double-counting an algorithm that tries to map the requested lat/lon of the line to the closest point in the true 2D lat/lon arrays

My thought for a relatively 'easy' way forward would be to create a new coord_type in diag_data.F90 (call it coord_type_indices) which would store the spatial_ops part of the field specification if it was all integers

     TYPE coord_type_indices
        INTEGER :: xbegin
        INTEGER :: xend
        INTEGER :: ybegin
        INTEGER :: yend
        INTEGER :: zbegin
        INTEGER :: zend
     END TYPE coord_type_indices

Then it seems like to implement this feature fully, requires at least 3 steps

coord_type_indices would need to be added to field_description_type,
Change init_output_field to basically do nothing if it coord_type_indices is populated
Modify get_subfield_size to just use the information in coord_type_indices

There is some ambiguity if the user wanted a regional subset whose lat/lon/depth were all whole numbers. For example, if someone wanted to take a subset of temperature in the region spanned by 10E-20E, 20N-40N, 10-50m and wrote the field line as 10 20 20 40 10 50 then what would actually be output was the temp(20:40,10:20,10:50) instead of the actual geographic location. That being said, so long as this behavior is documented that you must provide all reals or all integers it 1) could be guarded against or 2) left as an exercise to the user.

wrongkindofdoctor commented 5 years ago

Thanks for providing the code that is causing the issue. I'll discuss your suggested fix with @thomas-robinson, and we'll determine how to proceed.

ashao commented 5 years ago

Thanks, I'll pass it along at the next MOM6 dev meeting

wrongkindofdoctor commented 5 years ago

@underwoo Per the suggestions from other team members, @thomas-robinson and I will check to see whether horiz_interp has capability to support indexing on the tripolar grid. FMS will not support user-specified indices because the number of bytes associated with a particular integer KIND is compiler-dependent, and the indices themselves are resolution-dependent; the implementation is, therefore, tricky and error-prone.

StephenGriffies commented 5 years ago

I observe that we have documented (verbally from savvy users/developers) errors with the current setup, whereas well documented user-specified indices would have resolved the problem with these folks. My hope is that a desire to realize code that is not "tricky and error-prone" is in turn not going to indefinitely compromise the current need to resolve the present code setup, which can be classified as "very trick and very error-prone."

ashao commented 5 years ago

The resolution-dependence of the indices was discussed and deemed a desirable aspect of it. I'm not entirely sure how reading in user supplied indices would really affect portability between compilers, so long as the KIND of integer when compiled was 16-bits or more. Could you clarify what you mean by this? For some quantities like transports through straits where it really does matter what the actual prognostic transport is, I'm leary of relying on horiz_interp because of the risk of double-counting transports.

Maybe this would be a good time for some of the MOM6 and FMS folks should get together and make sure we're all on the same page.

ashao commented 5 years ago

BTW, it should be @gustavo-marques, not gmarques who should be tagged in this as well.

wrongkindofdoctor commented 5 years ago

@ashao @gustavo-marques I think a face-to-face meeting is a good idea, and will email both of you, and a couple of other FMS folks who may want to attend, to coordinate a time. The Fortran integer definition portability issue is described here (scroll down to the second answer). While the larger concern is ensuring correct indexing, the lack of standard definitions for integer KIND means that users will have to double check that the compiler indices correspond to the correct KIND value (e.g., integer (kind=4) is a 4-byte integer).

marshallward commented 5 years ago

Are we not just talking about specifying grid points, which are pairs (tuples) of integers? I don't see what this problem has to do with integer type.

If there's some constraint that I don't understand, can we not just use use iso_fortran_env, only : int64? AFAIK MOM6 and FMS are both already doing this.

wrongkindofdoctor commented 5 years ago

@marshallward if this is the case, then we probably don't need to worry about integer type definitions. Someone raised the concern with inconsistent Fortran kind definitions at the FMS meeting, so I made sure to mention it in the conversation.

ashao commented 5 years ago

Thank you @wrongkindofdoctor for continuing the dialogue, providing additional the information, and for being willing to discuss alternate ways forward this with us. Thank you also @marshallward for being slightly confused with me, as I also was not quite clear on how the portability issue really would manifest. And to your question, yes it would be adding support for to read in a tuple of 6 integers (representing x, y, z grid indices) in place of (but retaining support for) a tuple of 6 reals (lat/lon/depth ranges). Your suggestion of explicitly typing the INTEGER seems sensible to me at least until we start using grids with 2^63-1 points.

If you think it would be useful to have a meeting to clarify the current limitations and our concerns going forward, I'd suggest the following folks from the MOM6 side: @ashao: as the original rabble-rouser @StephenGriffies: who had to fight with the current way of doing this for the OMIP-requested diagnostics @gustavo-marques: to represent NCAR who will need to go through a similar exercise in their MOM6 configuration @marshallward: to represent the GFDL MOM6-ers (or alternatively we could just let you know what transpires afterward).

ashao commented 5 years ago

I also wanted to slightly push back on the characterization that ensuring correct indexing due to resolution dependence would be a new potential source of error. The current diag_manager is currently forcing users to 'guess and check' in an unclear way.

@StephenGriffies mentioned that he had 'trick' the current FMS implementation by deliberately putting in incorrect longitudinal/latitudinal extents for the straits in order to get the full transport. For example in the current OM4_025 configuration, the eastern extent for Fram Strait is specified at 0E, which geographically is in the middle of the strait, but because of the issue detailed here, the actual output will return diagnostics which span the entirety of the strait.

https://github.com/NOAA-GFDL/MOM6-examples/blob/2ba372d642b1e7fe01bed15b927e0d7627c26cf7/ice_ocean_SIS2/OM4_025/diag_table.MOM6#L295-L300

thomas-robinson commented 5 years ago

It seems as thought diag_manager is having difficulty supporting the lat/lon of the tripolar grid, which needs to be fixed. You are supposed to give lat/lon which are reals, but if you specify indices, then you will be sending integers. What happens next is unknown and compiler dependent. Will it fail because there is no routine that accepts integers? Will it cast the values to real and then proceed? If you manage to get the routine to work, how will it know if you are sending it grid index values or lat/lon coordinates?

The diag_manager must support the tripolar grid and work correctly if you provide the correct lat/lon values. You shouldn't need to trick it. If you have to trick it, then we need to fix it. It makes more sense to me to fix what is already there instead of having code that doesn't work half of the time and writing new code.

marshallward commented 5 years ago

Looking at data_table.F90, it seems that FMS is parsing the coords as a string, saves it to spatial_ops inside of a field_description_type struct, which is then moved into regional_coords if not none, which is finally where the data is saved in a coord_type as a 6-tuple of reals.

It seems that this process could be interrupted at the spatial_ops -> regional_coords conversion to save the tuple as integers rather than floats if there were some way to signal this method. Since we're talking about a generic string, there ought to be a number of options here.

I also don't foresee any portability issues, since we're relying on Fortran runtime to do the string->number conversions.

But I also agree that it ought to work under tripole grids, and if the problem is as @ashao described where it's using the axis -- whch is largeely nonsense in the tripole region -- rather than the grid, then that would be the problem.

ashao commented 5 years ago

@thomas-robinson you raise good points and I agree that 1) it will take a bit of code to figure out if the user is specifying a tuple of REALs or INTEGERS, but again I don't really see how this is a portability problem and 2) that the code should be fixed (or at least hard fail) for the tripolar grid.

@marshallward: That was essentially what I was thinking that it would be an intercept of whether the indices should be computed or not based on the parsed type of the tuple. One way to distinguish it would be to parse the tuple and check to see if all of them are INTEGER.

do k=1,6
  read(*,'(I)',iostat=err) index_int
  if (err .ne. 0) then
    is_coords = .true.
    exit
enddo
! Sprinkle the rest of magic to make this work

Here's another use case to support regional subsetting by indices: The fundamental problem regardless of whether the algorithm is 'fixed' or not is that specifying a [minlon maxlon minlat maxlat] assumes that a rectangular box can be defined exactly on a discrete, irregular grid. For some applications, it's probably 'good enough' to have something that's not quite a connected line to form the boundary of regional subset. However this is an acute problem for diagnostics which might be used to do drive regional models. For example, imagine that global OM4_025 is being used to drive a regional model which is a perfect integer refinement of the OM4_025 grid. If that integer refinement is 1 (e.g., for example the Baltic_OM4_025 case which is literally a manually snipped out part of the domain, though it doesn't have open boundaries), hypothetically we could get bit-for-bit reproducibility with enough time resolution/output at the boundary. However, if the regional subset does not return output exactly at the boundaries, this reproducibility is impossible to achieve. For integer refinements > 1, if the extracted line has a 'jag' in it (i.e. the cells at the boundary are not direct U-point or V-point neighbors, the physical state of the boundary is not consistent and spurious divergences would be calculated.

@wrongkindofdoctor, @thomas-robinson : would it be a good time to split this into two issues to better keep track of the conversations? 1) Bugfix: regional subsetting based on geographic extent for the tripolar grid 2) Feature request: Option to subset of domain to be specified by grid indices

Again thank you to the two of you for continuing the discussion

wrongkindofdoctor commented 5 years ago

@ashao Yes, I think we need another issue, so please open a new one tagged as feature request for indexing based on the geographic extent of the tripolar grid (or however you think the request should be titled for clarity).

The current issue (the option to subset a domain on the tripolar grid) would be an enhancement of the new indexing scheme, so I'l remove the bug tag.

ashao commented 5 years ago

@wrongkindofdoctor, I think it's actually the opposite way around :). This one is the bug-fix and the new one will be a feature request. Really shows how far off I've dragged the conversation.

ashao commented 5 years ago

Just to really reorient the conversation here: It seems like to support the tripolar grid, the 2d-arrays specifying the lat,lon of every grid point need to be passed in. There is already precedent for this because the 'cubed sphere' is already supported. I wonder if that routine could be generalized to work with any irregular grid.

thomas-robinson commented 1 year ago

@uramirez8707 has added this feature to the new diag manager. You can see it's type here https://github.com/NOAA-GFDL/FMS/blob/b0334fca4e92607d6e8bb68d509b17de62754edd/diag_manager/fms_diag_yaml.F90#L77-L85 . You will be able to pass in the grid_type as index_gridtype and then give the i,j values. I believe you set it in the diag_table.yaml file.

ashao commented 1 year ago

@thomas-robinson @uramirez8707: that's great! Looking forward to trying this out. Feel free to close out this issue if you feel that it's appropriate.

uramirez8707 commented 6 months ago

Fixed in #1505

NOAA-GFDL / FMS

diag_manager: Regional subsetting for u-point variables does not seem to be using correct grid #93

Example code to find regional indices