PCMDI / cmor

Climate Model Output Rewriter
BSD 3-Clause "New" or "Revised" License
53 stars 32 forks source link

Creating a longitude axis with bounds fails for 0.05 degree global grid #146

Closed JimBiardCics closed 7 years ago

JimBiardCics commented 7 years ago

I have a dataset with a global 0.05 deg lat/lon grid. The lat and lon variables in the input netCDF file are 32-bit floats. When I attempt to create a CMOR axis from the CDMS2 axis using the bounds calculated by CDMS2, I get errors and a warning like

C Traceback:
! In function: cmor_check_monotonic
! called from: cmor_treat_axis_values
! called from: cmor_axis
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: axis longitude (table: Omon) has overlapping bounds values:
! 179.950012, 180.000031, 180.000000 at index: 7198
!
!!!!!!!!!!!!!!!!!!!!!!!!!

C Traceback:
! In function: cmor_check_monotonic
! called from: cmor_treat_axis_values
! called from: cmor_axis
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: axis longitude (table: Omon) has overlapping bounds values:
! 180.000031, 180.000000, 180.050018 at index: 7199
!
!!!!!!!!!!!!!!!!!!!!!!!!!

C Traceback:
In function: cmor_check_monotonic
! called from: cmor_treat_axis_values
! called from: cmor_axis
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Warning: axis longitude (table: Omon) has bounds values that leave gaps
! (index 7198): 179.950012, 180.000031, 180.000000
!
!!!!!!!!!!!!!!!!!!!!!!!!!

When I inspect the longitude and longitude bounds values, I find that they are all within one 32-bit floating point epsilon of their ideal values.

I've attached a small python program that shows the problem. It has one argument, which is the power of 2 to use to generate an offset to apply to the longitude values when creating them. If you run the program with no argument, it applies no offset. The program assumes that you have a Tables folder in the current working directory.

For values around 180.0, one 32-bit floating point epsilon is 2^-17. To see the impact of this small change, run the program as

python cmor_grid_problem.py

then run the program as

python cmor_grid_problem.py -17

cmor_grid_problem.py.txt

taylor13 commented 7 years ago

I don't think this is a CMOR problem is it? I think something is wrong with the CDMS2 code used to create the bounds. In this case the 2nd bound of one cell should be identical with the 1st bound of the neighboring cell. This doesn't appear to be the case.

For CMIP6 we require coordinate values and bounds be carried as 64 bit floats (double precision), which would at least reduce the amount of overlap of the cells, I should think, but I think that CDMS has to be smart enough to assign identical values to the bound between 2 grid cells no matter which cell is considered and no matter what the precision of the variable storing these bounds.

JimBiardCics commented 7 years ago

@taylor13 That's a good point that CDMS2 should produce matched bounds, but since there's not a guarantee of double precision in an incoming dataset, it seems that CMOR should be more forgiving. I have had a hard time figuring out exactly what CMOR is claiming the problem is, because I don't see the all the values in the error reports in my longitude and bounds values. Some of it seems to arise from looking at the cyclical boundary at +-180 degrees, but it seems to me that one 32-bit epsilon difference shouldn't be enough to sink the ship.

All that to say that I think there are issues on both ends. A 0.05 longitude step size presents a difficult problem, because it is not perfectly representable in any float, 32-bit or 64-bit. The 1/5 buried in it makes everything ugly. This problem won't arise with step sizes that are powers of 2, nor with integer step sizes. CMDS2 should do a better job, and CMOR should be more forgiving.

On 3/23/17 6:16 PM, taylor13 wrote:

I don't think this is a CMOR problem is it? I think something is wrong with the CDMS2 code used to create the bounds. In this case the 2nd bound of one cell should be identical with the 1st bound of the neighboring cell. This doesn't appear to be the case.

For CMIP6 we require coordinate values and bounds be carried as 64 bit floats (double precision), which would at least reduce the amount of overlap of the cells, I should think, but I think that CDMS has to be smart enough to assign identical values to the bound between 2 grid cells no matter which cell is considered and no matter what the precision of the variable storing these bounds.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PCMDI/cmor/issues/146#issuecomment-288877252, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ5DJmm1xZgLw1UNuKHtjcaoZJCMIXWzks5rou8tgaJpZM4MnZm4.

-- CICS-NC http://www.cicsnc.org/ Visit us on Facebook http://www.facebook.com/cicsnc Jim Biard Research Scholar Cooperative Institute for Climate and Satellites NC http://cicsnc.org/ North Carolina State University http://ncsu.edu/ NOAA National Centers for Environmental Information http://ncdc.noaa.gov/ /formerly NOAA’s National Climatic Data Center/ 151 Patton Ave, Asheville, NC 28801 e: jbiard@cicsnc.org mailto:jbiard@cicsnc.org o: +1 828 271 4900

/Connect with us on Facebook for climate https://www.facebook.com/NOAANCEIclimate and ocean and geophysics https://www.facebook.com/NOAANCEIoceangeo information, and follow us on Twitter at @NOAANCEIclimate https://twitter.com/NOAANCEIclimate and @NOAANCEIocngeo https://twitter.com/NOAANCEIocngeo. /

JimBiardCics commented 7 years ago

I've produced an updated example program that optionally forces the bounds to be butt-joined. CMOR still generates errors when there is an offset of 2*-17 added. So there appears to be a definite CMOR problem. The new program takes two arguments, a clean/noclean argument and the offset exponent, with an exponent value of zero producing an offset of zero. Run the app with no arguments to get a help message.

cmor_grid_problem2.py.txt

dnadeau4 commented 7 years ago

@JimBiardCics CDMS looks for data that are right at the dateline (-180,180) and you cannot pass the dateline or you have a cross over. When I create the bounds, if the last values in your longitude plus dx/2 is more that 180, I fudge the value to 180.

The problem is really that you create data that are beyond 180. (180.000031) and that the latest bound is set to 180. I agree that the message in CDMS about not being monotonic could be a bit more explicit. This happens often in the time axis as well and it very hard to find the issue.

I don't want to make it more forgiving, for this will mean I have to change the data created by the user. People have very strong opinion about their data, especially land people with very high resolution projections. You could use numpy.round to round your data to 2 digits when you create them.

dnadeau4 commented 7 years ago

Jim, this clever way to create data always works for me.... People always told me that bounds are more important than the center lat/lon and they create the bounds first and lat/lon center next.

xs = numpy.linspace(-180.0,180.0, 7201) 
xbounds = numpy.array([xs[0:-1], xs[1::]]).T
values = (xbounds[:, 1] + xbounds[:, 0]) / 2 
values2 = values + offset

diff = values2 - values
JimBiardCics commented 7 years ago

@dnadeau4 Did you mean to write CMOR above instead of CMDS2? I'm not creating data that falls over the bounds. CDMS2.getBounds() is. If you check using my program by running it with the arguments 'noclean 0', you'll see that the first and last bounds values produced by CDMS2 are both over the +-180 degree cut line (-180.000007629394531, 180.000007629394531) yet they don't produce an error. There's something else going on. Run my program with arguments 'clean -17', and you'll see that the first and last bounds are now (-180.000015258789062, 180.0). So perhaps the test is sensitive to overlaps greater than 0.00001 degrees.

I don't understand why making the test that is failing more forgiving (allowing overlaps of 1.0e-4 degrees, for example) implies any sort of changing people's data. The problem I am having is that my input longitude data is 32-bit floating point, has a slight offset (perhaps due to the source architecture), and has a step size of 1/20 degrees. The cells are centered at half steps, so the values are at multiples of 0.025. This means that some of my longitude values are not representable in exact form in 32-bit or 64-bit floats on account of the factor of 1/5. I don't think that rounding to two or three digits will fix the problem. It might if I first force the values into 64-bit floating point.

JimBiardCics commented 7 years ago

@dnadeau4 I'm not creating the data in my actual program. I'm reading it from input files.

dnadeau4 commented 7 years ago

@taylor13 @doutriaux1 @durack1 -- I have been looking at genGenericBounds(), and I am voting to remove autobounds completely from CDMS.

Let people generate their own bounds. This has been an occuring issue with autobounds and it seems that turning it off by default is not enough. The computation is right, but the floating point mantissa cannot handle that kind of precision, especially in Python.

I tried with double and I still have the same issue...

def genGenericBounds():
...
leftPoint = numpy.array([1.5*ar[0]-0.5*ar[1]], dtype=numpy.float64)
midArray = (ar[0:-1]+ar[1:])/2.0
rightPoint = numpy.array([1.5*ar[-1]-0.5*ar[-2]], dtype=numpy.float64)
bnds = numpy.concatenate((leftPoint,midArray,rightPoint))
retbnds = numpy.zeros((len(ar),2), dtype=numpy.float64)
retbnds[:,0] = bnds[:-1]
etbnds[:,1] = bnds[1:]

if self.isLatitude():
    retbnds[0,:] = numpy.maximum(-90.0, numpy.minimum(90.0,retbnds[0,:]))
    retbnds[-1,:] = numpy.maximum(-90.0, numpy.minimum(90.0,retbnds[-1,:]))
return retbnds

I see that we ensure -90 to 90 in latitude, but in longitude it can be -180 to 360...
We could look if the minimum value is and force 0-360 and if it is negative -180 to 180, but some grids are "-90 to 270".

dnadeau4 commented 7 years ago

@JimBiardCics You are using CDMS to create the bounds.

cdms2.setAutoBounds('on')
bounds = axis.getBounds()

You could also create your bounds and do

cdms2.setAutoBounds('off')
bounds=my_bounds
axis.setBounds(bounds)
durack1 commented 7 years ago

@dnadeau4 I don't think removing this functionality from cdms is the right step forward, adding a warning could be suitable, but it works well for standard lon/lat grids and I use this a fair bit as an end-user.. Happy to hear from @doutriaux1, @taylor13 and @gleckler1 on this

dnadeau4 commented 7 years ago

if you can figure out a better equations, I will be happy to implement it. I am still trying to figure this out... I was supposed to do @taylor13 CMOR list today...

dnadeau4 commented 7 years ago

I was able to get CMOR to accept the bounds if I round the to 1/1000 (1e4) using @JimBiardCics program with -17. When CMOR convert the longitude bounds to 0-360, the dateline become non-monotonic in this case due to floating point precision issue.

dnadeau4 commented 7 years ago

If I round the limits of the automatically generated bounds to 4 decimals (1e-4) it works in this case....

retbnds[0,0] = round(retbnds[0,0],4)
retbnds[-1,1] = round(retbnds[-1,1],4)
dnadeau4 commented 7 years ago

I have done this in CDMS.

durack1 commented 7 years ago

@dnadeau4 if you wanted to test this, there are a very large number of CMIP5 model datasets that you could run it against..

durack1 commented 7 years ago

@dnadeau4 how far can you push that, would 5 or 6 decimal places also work? I'm just trying to think about very high res cases where the decimal place matters..

dnadeau4 commented 7 years ago

I will have to cmorize your large number of CMIP5 model datasets. Nice try! :smirk:

durack1 commented 7 years ago

@dnadeau4 no, not rewrite any data, but just use the additional grids in your test case.. I've seen some very weird coordinate numbers in the CMOR-written CMIP5 datasets

dnadeau4 commented 7 years ago

5 does not work, precision is 0.000015258789062 for 2^-17 which is the limit of the mantissa.

dnadeau4 commented 7 years ago

I'd like to see your grids and figure out what happened.

dnadeau4 commented 7 years ago

I think that is a good work around and will push it to master @durack1 @taylor13 @doutriaux1 Do you agree?

retbnds[0,0] = round(retbnds[0,0],4)
retbnds[-1,1] = round(retbnds[-1,1],4)
durack1 commented 7 years ago

@dnadeau4 4th decimal place precision has an effective resolution of 0.01 km which I believe is suitable for all examples I can think of (I say this with a very global perspective though).. I have no idea what data would have a higher resolution than even 0.1 degree (~11 km) which is considerably more crude than your limit

JimBiardCics commented 7 years ago

@dnadeau4

Would this change be in CMDS2 or CMOR?

On 3/27/17 1:17 PM, Paul J. Durack wrote:

@dnadeau4 https://github.com/dnadeau4 4th decimal place precision has an effective resolution of 0.01 km which I believe is suitable for all examples I can think of (I say this with a very global perspective though).. I have no idea what data would have a higher resolution than even 0.1 degree (~11 km) which is considerably more crude than your limit

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PCMDI/cmor/issues/146#issuecomment-289521510, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ5DJoo9YpAdh6EakSatptNRTVgxWoOkks5rp-89gaJpZM4MnZm4.

-- CICS-NC http://www.cicsnc.org/ Visit us on Facebook http://www.facebook.com/cicsnc Jim Biard Research Scholar Cooperative Institute for Climate and Satellites NC http://cicsnc.org/ North Carolina State University http://ncsu.edu/ NOAA National Centers for Environmental Information http://ncdc.noaa.gov/ /formerly NOAA’s National Climatic Data Center/ 151 Patton Ave, Asheville, NC 28801 e: jbiard@cicsnc.org mailto:jbiard@cicsnc.org o: +1 828 271 4900

/Connect with us on Facebook for climate https://www.facebook.com/NOAANCEIclimate and ocean and geophysics https://www.facebook.com/NOAANCEIoceangeo information, and follow us on Twitter at @NOAANCEIclimate https://twitter.com/NOAANCEIclimate and @NOAANCEIocngeo https://twitter.com/NOAANCEIocngeo. /

dnadeau4 commented 7 years ago

@JimBiardCics This will change CDMS2 in creating bounds, making sure to round the first and last bounds so that this computer precision issue does not arise.

JimBiardCics commented 7 years ago

Denis,

Here is a bit of code I wrote to clean up the bounds returned by the axis getBounds method.

def getBounds(axis):
    '''
    Get clean bounds for the axis.
    Warning: This won't work for climatological time bounds.

    axis [in] A cdms2 axis object.
    returns   A 2-D bounds array with a shape of (N, 2).
    '''

    # Get the bounds from the axis. If the axis doesn't already have bounds
    # associated with it, this call calculates some.
    #
    bounds = axis.getBounds()

    # If the first bounds don't seem reasonable, fix them.
    #
    if bounds[0,0] == bounds[0,1]:
        bounds[0,1] = bounds[1,0]
        bounds[0,0] = 2.0 * bounds[1,0] - bounds[1,1]

    # If the last bounds don't seem reasonable, fix them.
    #
    if bounds[-1,0] == bounds[-1,1]:
        bounds[-1,0] = bounds[-2,1]
        bounds[-1,1] = 2.0 * bounds[-2,1] - bounds[-2,0]

    # The values calculated by CDMS2 aren't always very good for some reason,
    # so force the bounds to be butt-joined.
    #
    bounds[1:,0] = bounds[:-1,1]

    # If this is a longitude axis and the edge bounds overlap, fix them so they
    # don't.
    #
    if axis.isLongitude():
        upperBound = bounds.max()
        lowerBound = bounds.min()

        overlap = upperBound - lowerBound - 360.0

        if 0.0 < overlap:
            # Subtract half the overlap from the maximum values, and add half
            # the overlap to the minimum values.
            #
            overlap *= 0.5

            lowerBound += overlap
            upperBound -= overlap

            bounds = numpy.clip(bounds, lowerBound, upperBound)

    # Return the bounds array.
    #
    return bounds
taylor13 commented 7 years ago

@dnadeau4 Here's a simple approach that will ensure (in whatever precision) that the bounds of contiguous grid cells coincide: Let

C(i) = coord. values (assumed to be n values montonically increasing, but the algorithm could easily be generalized to also handle decreasing) supplied by user) Bfirst = lower limit of domain (i.e., lower bound of first grid cell) supplied by user Blast = upper limit of domain (i.e., upper bound of last grid cell) supplied by user

B(i,j) = coord. bounds returned by algotrithm

Algorithm:

B(1,1) = Bfirst B(n,2) = Blast

do i=1,n-1 B(i,2) = (C(i) + C(i+1))/2. B(i+1,1) = B(i,2) enddo

This should work at any precision, even if the calculated values are subsequently converted to a higher or lower precision.

Two issues: 1) Can we avoid the special treatment of Bfirst and Blast? 2) Are there special issues with the precision of Bfirst and Blast that come into play when longitude and latitude bounds are considered?

Issue 1: Can we avoid the special treatment of Bfirst and Blast?

If the user does not provide Bfirst and Blast, we could use the following algorithm to make up values:

Bfirst = 1.5C(1) – 0.5C(2) (i.e., C(1) is centered between its bounds) Blast = 1.5C(n) – 0.5C(n-1) (i.e., C(n) is centered between its bounds)

Issue 2: Are there special issues with the precision of Bfirst and Blast that come into play when longitude and latitude bounds are considered?

If we know a domain has hard limits (e.g, -90 <= lat <= 90) then we should perform a test:

If (coord is lat with units of degrees) If (Bfirst < -90.) Bfirst = -90. If (Blast > 90.) Blast = 90. endif

If a code converts from one precision to another, then the above test would have to be repeated at the new precision.

After executing the above code, we could also perform the following test on the lat bounds (to ensure that the domain extends from -90 to 90 when we think it nearly covers the globe):

If (coord is lat with units of degrees) If (Bfirst + 90. < 0.01 (C(2)-C(1)) Bfirst = -90. If (90. - Blast < 0.01 (C(n)-C(n-1)) Blast = 90. endif

For longitude, we could similarly generate first and last bounds we think are likely correct by applying the following algorithm (checking whether or not the first and last bounds nearly coincide with either the prime meridian or 180 deg from it):

If (coord is lon with units of degrees)

       ! calculate how far from prime or prime+180 meridian the first bound is
dist = mod(Bfirst, 180.)  
If (dist > 90.) dist = dist -180.

     ! determine if first bound is near enough a “nice” meridian to be rounded to it

  if (abs(dist) <  0.01* abs(C(2)-C(1)))  bfirst = 180*int(bfirst/180.+0.5)

  ! similarly for last bound

dist = mod(Blast, 180.)  
If (dist > 90.) dist = dist -180.
If (abs(dist < 0.1*abs(C(n) – C(n-1))) blast = 180*int(blast/180.+0.5)

endif

The above could be generalized to also handle lat and lon in units of radians rather than degrees.

Similar checks should be incorporated into any code that converts coordinate bounds from single to double precision or vice versa, to ensure that the change in precision preserves the global domain.

One final note: We shouldn't arbitrary exclude any longitude domain. The above would allow domains to extend more than 360 degrees and have values for any period (not limited to periods near -180, 0, 180, 360).

dnadeau4 commented 7 years ago

@taylor13 and @durack1 Can you revise this code. I got rid of floating point error precision for longitude at the date line when appropriate. I tried with Jim programs and it works in every situations.

        if self.isLongitude():
             # Make sure we have at least 360 degree interval
             if( abs(retbnds[0,0] - retbnds[-1,1]) >= 360.0):
                 # Make sure that precision is within 1% of of delta lon
                 deltaLonEps = (self[1]-self[0])/100.0
                 # retrieve fraction part  only (get rid of integer)
                 decimalBfirst = abs(retbnds[0,0]- numpy.trunc(retbnds[0,0]))
                 decimalBlast  = abs(retbnds[-1,1]- numpy.trunc(retbnds[-1,1]))

                 # make sure we don't have .999 but something between 0-0.5
                 if( round(decimalBfirst) == 1.0) :
                     decimalBfirst = 1-decimalBfirst
                 if( round(decimalBlast) == 1.0) :
                     decimalBlast = 1-decimalBlast
                 # Find out how many 0s we have after the decimal point
                 nbDigits = numpy.trunc(abs(numpy.log10(decimalBfirst)))
                 nbDigits = nbDigits.astype(numpy.int32)
                 # trunc after the to the nth Digit
                 roundBfirst = numpy.round(retbnds[0,0],nbDigits)
                 # Same thing for last bounds
                 nbDigits = numpy.trunc(abs(numpy.log10(decimalBlast)))
                 nbDigits = nbDigits.astype(numpy.int32)
                 roundBlast = numpy.round(retbnds[-1,1],nbDigits)
                 diffBfirst = abs(roundBfirst - retbnds[0,0])
                 diffBlast = abs(roundBlast - retbnds[-1,1])
                 # Cyclical longitude can be shaved to nth decimal since the 
                 # difference between Bounds and is within 1% of longitude difference
                 if((retbnds[0,0] != roundBfirst) and (diffBfirst < deltaLonEps)): 
                     warnings.warn("Your first bounds[0,0] %f will be corrected to %f" %
                                   (retbnds[0,0], roundBfirst),UserWarning)
                     retbnds[0,0] = roundBfirst

                 if((retbnds[-1,1] != roundBlast) and (diffBlast < deltaLonEps)): 
                     warnings.warn("Your bounds bounds[-1,1] %f will be corrected to %f" %
                                   (retbnds[-1,1], roundBlast),UserWarning)
                     retbnds[-1,1] = roundBlast
taylor13 commented 7 years ago

@dnadeau4

I thought CDMS was written in C, not python, but no matter, I don't really know either.

The first ("if" statement) should also check that the units on longitude are "degrees" (or similar, like "degrees_east")

I think we should perform the check even when the difference between the end bounds is just a bit less than 360, because I think we can assume they wanted global coverage. Also, if the domain is significantly greater than 360, we can't expect codes to properly wrap the data, so I don’t think we should adjust the bounds unless we’re near 360. I think the code can be shortened to:

If (longitude coord. and units are degrees) 

    domain = bnds[-1,1] – bnds[0,0]
    if abs((abs(domain - 360.) < Min(0.01, abs(bnds[0,1]-bnds[0,0])*0.1)

           #  they probably meant domain to span exactly 360 deg.

            # initialize logical:
    setbnd = .false.

        # now check whether first bound is near an integer value

    if abs(bnds[0,0] - int(bnds[0,0] + 0.5) < abs(bnds[0,1]-bnds[0,0])*0.01
        bnds[0,0] = int(bnds[0,0] + 0.5)
        setbnd = .true.

        # now check whether last bound is near an integer value
    if abs(bnds[-1,1] - int(bnds[-1,1] + 0.5) < abs(bnds[-1,1]-bnds[-1,0])*0.01
        bnds[-1,1] = int(bnds[-1,1] + 0.5)
        setbnd = .true.

    # even if neither bound was near an integer, the difference should be exactly 360

    if (.not. setbnd) 
        if (domain > 0)
            bnds[-1,1] = bnds[0,0] + 360.
        else
            bnds[0,0] = bnds[-1,1] + 360.

Note that the "int" function returns the largest integer <= to the passed argument, so int(-1.1) = -2 and int (1.1) = 1

dnadeau4 commented 7 years ago

@taylor13 I'll give it a try...

Actually it is numpy.floor that does this, not int()

>>> numpy.floor(-1.1)
-2.0
>>> numpy.floor(1.1)
1.0
>>> 
taylor13 commented 7 years ago

Reviewing my pseudo-code above (which is not written in any known language but should be clear), I've found ways to simplify it:

If (longitude coord. and units are degrees) 

    if abs((abs(bnds[-1,1] – bnds[0,0] - 360.) < Min(0.01, abs(bnds[0,1]-bnds[0,0])*0.1)

           #  they probably meant domain to span exactly 360 deg.

        # now check whether either bound is near an integer value; if yes round both to integer

    if (abs(bnds[0,0] - int(bnds[0,0] + 0.5) < abs(bnds[0,1]-bnds[0,0])*0.01)) .or.
         (abs(bnds[-1,1] - int(bnds[-1,1] + 0.5) < abs(bnds[-1,1]-bnds[-1,0])*0.01)

                    bnds[0,0] = int(bnds[0,0] + 0.5)
                    bnds[-1,1] = int(bnds[-1,1] + 0.5)

            else

           # even if neither bound was near an integer, the difference should be exactly 360
        if (bnds[-1,1] > bnds[0,0])
            bnds[-1,1] = bnds[0,0] + 360.
        else
            bnds[0,0] = bnds[-1,1] + 360.

Note that the "int" function returns the largest integer <= to the passed argument, so int(-1.1) = -2 and int (1.1) = 1

dnadeau4 commented 7 years ago

I submit a pull request in CDMS yesterday with the latest modification. It should solve this bound issue for CMOR.

https://github.com/UV-CDAT/cdms/pull/109