CDAT / cdms

9 stars 10 forks source link

File open fails when lon and lat are variables (not dimensions) #32

Closed durack1 closed 7 years ago

durack1 commented 8 years ago

The attached file is unreadable in the current version of cdms2 (v2.8). I believe the issue is that the variables (not dimensions) lon and lat included in the file are assumed to be standard dimensions (which they aren't), and these assumptions lead to a read failure.

cdmsReadFail.zip

The file looks like:

ncdump -h 161122_RobertPincus_multiple_input4MIPs_radiation_RFMIP_UColorado-RFMIP-20161122_none.nc 
netcdf \161122_RobertPincus_multiple_input4MIPs_radiation_RFMIP_UColorado-RFMIP-20161122_none {
dimensions:
    expt = 18 ;
    level = 61 ;
    layer = 60 ;
    site = 100 ;
variables:
    float lon(site) ;
        lon:long_name = "ERA-Interim longitude" ;
        lon:units = "degree_north" ;
        lon:standard_name = "longitude" ;
    float lat(site) ;
        lat:long_name = "ERA-Interim latitude" ;
        lat:units = "degree_east" ;
        lat:standard_name = "latitude" ;
    float time(site) ;
        time:long_name = "ERA-Interim fractional day of the year 2014" ;
        time:units = "days since 2014-1-1 0:0:0" ;
        time:standard_name = "time" ;
        time:calendar = "gregorian" ;
    float sst(site) ;
        sst:title = "sea surface temperature" ;
        sst:units = "K" ;
        sst:long_name = "ERA-Interim sea surface temperature (= \"missing_value\" over land)" ;
        sst:standard_name = "sea_surface_temperature" ;
        sst:missing_value = -9.99f ;
        sst:FillValue = -9.99f ;
        sst:coordinates = "lon lat time" ;
...

And the read failure looks like:

In [1]: infile = '161122_RobertPincus_multiple_input4MIPs_radiation_RFMIP_UColorado-RFMIP-20161122_none.nc'
In [2]: import cdms2 as cdm
In [3]: f = cdm.open(infile)
---------------------------------------------------------------------------
CDMSError                                 Traceback (most recent call last)
<ipython-input-3-da3bd29babf2> in <module>()
----> 1 f = cdm.open(infile)
/uvcdat280/lib/python2.7/site-packages/cdms2/dataset.pyc in openDataset(uri, mode, template, dods, dpath, hostObj)
    357 
    358             # The file exists
--> 359             file1 = CdmsFile(path, "r")
    360             if libcf is not None:
    361                 if hasattr(file1, libcf.CF_FILETYPE):
/uvcdat280/lib/python2.7/site-packages/cdms2/dataset.pyc in __init__(self, path, mode, hostObj, mpiBarrier)
   1137                 # Get grid information for the variable. gridkey has the form
   1138                 # (latname,lonname,order,maskname, abstract_class).
-> 1139                 gridkey, lat, lon = var.generateGridkey(self._convention_, self.variables)
   1140 
   1141                 # If the variable is gridded, lookup the grid. If no such grid exists,
/uvcdat280/lib/python2.7/site-packages/cdms2/avariable.pyc in generateGridkey(self, convention, vardict)
    239         lon, nlon = convention.getVarLonId(self, vardict)
    240         if (lat is not None) and (lat is lon):
--> 241             raise CDMSError, "Axis %s is both a latitude and longitude axis! Check standard_name and/or axis attributes."%lat.id
    242 
    243         # Check for 2D grid
CDMSError: Axis lon is both a latitude and longitude axis! Check standard_name and/or axis attributes.
doutriaux1 commented 8 years ago

when did you create your conda env? I uploaded an updated cdms2 last night.

durack1 commented 8 years ago

@doutriaux1 here we go:

(uvcdat280) bash-3.2$ ls -ald /uvcdat280/lib/python2.7/site-packages/cdms2
drwxr-xr-x  100 duro  THE-LAB\Domain Users  3400 Nov 16 15:33 /uvcdat280/lib/python2.7/site-packages/cdms2
(uvcdat280) bash-3.2$ conda update cdms2
Fetching package metadata .......
# All requested packages already installed.
# packages in environment at /uvcdat280:
#
cdms2                     2.8                 np111py27_2    uvcdat
doutriaux1 commented 8 years ago

please do a force

durack1 commented 8 years ago

@doutriaux1 same issue, it's a problem with cdms:

(uvcdat280) bash-3.2$ conda update cdms2 --force
Fetching package metadata .......
Solving package specifications: ..........
Package plan for installation in environment /uvcdat280:
The following NEW packages will be INSTALLED:
    cdms2: 2.8-np111py27_2 uvcdat
Proceed ([y]/n)? y
Pruning extracted packages from the cache ...
[      COMPLETE      ]|########################################################################################################################################################| 100%
Extracting packages ...
[      COMPLETE      ]|########################################################################################################################################################| 100%
Linking packages ...
(uvcdat280) bash-3.2$ ipython
Python 2.7.12 | packaged by conda-forge | (default, Sep  8 2016, 14:41:48) 
Type "copyright", "credits" or "license" for more information.
IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
In [1]: import cdms2 as cdm
In [2]: infile = '161122_RobertPincus_multiple_input4MIPs_radiation_RFMIP_UColorado-RFMIP-20161122_none.nc'
In [3]: f = cdm.open(infile)
---------------------------------------------------------------------------
CDMSError                                 Traceback (most recent call last)
<ipython-input-3-da3bd29babf2> in <module>()
----> 1 f = cdm.open(infile)
/uvcdat280/lib/python2.7/site-packages/cdms2/dataset.pyc in openDataset(uri, mode, template, dods, dpath, hostObj)
    357 
    358             # The file exists
--> 359             file1 = CdmsFile(path, "r")
    360             if libcf is not None:
    361                 if hasattr(file1, libcf.CF_FILETYPE):
/uvcdat280/lib/python2.7/site-packages/cdms2/dataset.pyc in __init__(self, path, mode, hostObj, mpiBarrier)
   1137                 # Get grid information for the variable. gridkey has the form
   1138                 # (latname,lonname,order,maskname, abstract_class).
-> 1139                 gridkey, lat, lon = var.generateGridkey(self._convention_, self.variables)
   1140 
   1141                 # If the variable is gridded, lookup the grid. If no such grid exists,
/uvcdat280/lib/python2.7/site-packages/cdms2/avariable.pyc in generateGridkey(self, convention, vardict)
    239         lon, nlon = convention.getVarLonId(self, vardict)
    240         if (lat is not None) and (lat is lon):
--> 241             raise CDMSError, "Axis %s is both a latitude and longitude axis! Check standard_name and/or axis attributes."%lat.id
    242 
    243         # Check for 2D grid
CDMSError: Axis lon is both a latitude and longitude axis! Check standard_name and/or axis attributes.
durack1 commented 8 years ago

@doutriaux1 curiously, in a previous (and not identical) version of the file, if the lon:units were changed from "degree_north" to "degree north" (remove the underscore) it seemed to be readable by the checker/cdms

@dnadeau4 pinging you into this thread

dnadeau4 commented 8 years ago

degree_north and degree_east are accepted and CDMS2 should work with this file. cf-conventions

I will enhanced CDMS2 to make sure your lat/lon are detected correctly

durack1 commented 8 years ago

@dnadeau4 out of curiosity, I tried to update the file to include lon:axis = "X" and similar for lat and I still get the identical error:

[durack1@oceanonly RFMIP]$ ncdump -h test.nc | more                                                                                  
netcdf test {                                                                                                                        
dimensions:                                                                                                                          
        expt = 18 ;                                                                                                                  
        level = 61 ;                                                                                                                 
        layer = 60 ;                                                                                                                 
        site = 100 ;                                                                                                                 
variables:                                                                                                                           
        float lon(site) ;                                                                                                            
                lon:long_name = "ERA-Interim longitude" ;                                                                            
                lon:units = "degree_north" ;                                                                                         
                lon:standard_name = "longitude" ;                                                                                    
                lon:axis = "X" ;                                                                                                     
        float lat(site) ;                                                                                                            
                lat:long_name = "ERA-Interim latitude" ;                                                                             
                lat:units = "degree_east" ;                                                                                          
                lat:standard_name = "latitude" ;                                                                                     
                lat:axis = "Y" ;                                                                                                     
        float time(site) ;                                                                                                           
                time:long_name = "ERA-Interim fractional day of the year 2014" ;                                                     
                time:units = "days since 2014-1-1 0:0:0" ;                                                                           
                time:standard_name = "time" ;                                                                                        
                time:calendar = "gregorian" ;                                                                                        
        float sst(site) ;                                                                                                            
                sst:title = "sea surface temperature" ;                                                                              
                sst:units = "K" ;                                                                                                    
                sst:long_name = "ERA-Interim sea surface temperature (= \"missing_value\" over land)" ;                              
                sst:standard_name = "sea_surface_temperature" ;                                                                      
                sst:missing_value = -9.99f ;                                                                                         
                sst:FillValue = -9.99f ;                                                                                             
                sst:coordinates = "lon lat time" ;

And the error:

(uvcdatNightly) duri@ocean:[RFMIP]:[2858]> ipython
Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40)
Type "copyright", "credits" or "license" for more information.
IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
In [1]: import cdms2 as cdm
In [2]: infile = 'test.nc'
In [3]: f = cdm.open(infile)
---------------------------------------------------------------------------
CDMSError                                 Traceback (most recent call last)
<ipython-input-3-da3bd29babf2> in <module>()
----> 1 f = cdm.open(infile)
/uvcdatNightly/lib/python2.7/site-packages/cdms2/dataset.pyc in openDataset(uri, mode, template, dods, dpath, hostObj)
    307
    308             # The file exists
--> 309             file1 = CdmsFile(path,"r")
    310             if libcf is not None:
    311                 if hasattr(file1, libcf.CF_FILETYPE):
/uvcdatNightly/lib/python2.7/site-packages/cdms2/dataset.pyc in __init__(self, path, mode, hostObj, mpiBarrier)
   1063                 # Get grid information for the variable. gridkey has the form
   1064                 # (latname,lonname,order,maskname, abstract_class).
-> 1065                 gridkey, lat, lon = var.generateGridkey(self._convention_, self.variables)
   1066
   1067                 # If the variable is gridded, lookup the grid. If no such grid exists,
/uvcdatNightly/lib/python2.7/site-packages/cdms2/avariable.pyc in generateGridkey(self, convention, vardict)
    236         lon, nlon = convention.getVarLonId(self, vardict)
    237         if (lat is not None) and (lat is lon):
--> 238             raise CDMSError, "Axis %s is both a latitude and longitude axis! Check standard_name and/or axis attributes."%lat.id
    239
    240         # Check for 2D grid
CDMSError: Axis lon is both a latitude and longitude axis! Check standard_name and/or axis attributes.
durack1 commented 8 years ago

@dnadeau4 if you amend the lon:units = "degree north" (remove the underscore) it loads fine:

(uvcdatNightly) duro@ocean:[RFMIP]:[2858]> ipython
Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40)
Type "copyright", "credits" or "license" for more information.
IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
In [1]: import cdms2 as cdm
In [2]: infile = 'test_lonUnitTweak.nc'
In [3]: f = cdm.open(infile)
In [4]:

And for completeness the ncdump of the top of the file:

netcdf test_lonUnitTweak {
dimensions:
        expt = 18 ;
        level = 61 ;
        layer = 60 ;
        site = 100 ;
variables:
        float lon(site) ;
                lon:long_name = "ERA-Interim longitude" ;
                lon:standard_name = "longitude" ;
                lon:axis = "X" ;
                lon:units = "degree north" ;
        float lat(site) ;
                lat:long_name = "ERA-Interim latitude" ;
                lat:units = "degree_east" ;
                lat:standard_name = "latitude" ;
                lat:axis = "Y" ;
durack1 commented 7 years ago

@dnadeau4 just wondering where this issue is sitting? We're hoping to get @RobertPincus's data published in the input4MIPs project, however the publisher/cdms can't deal with his data currently..

dnadeau4 commented 7 years ago

Nothing yet, I am working on issue #7. @doutriaux1 gave me a priority list check label #priority and talk with @doutriaux1.

durack1 commented 7 years ago

@sashakames with #76 it seems that @dnadeau4 has solved this issue and so this version of bug-fixed cdms will need to be pulled across so the ESGF publisher can open and publish the file (attached above). Once this PR is merged, can we prioritise this bug fix patch into ESGF?

dnadeau4 commented 7 years ago

@durack1 this is now working. The issue was the wrong units for latitude/longitude. I used nco to change them and now cdms2 works.

ncatted -O -h -a units,lon,o,c,degree_east ~/Downloads/161122_RobertPincus_multiple_input4MIPs_radiation_RFMIP_UColorado-RFMIP-20161122_none.nc
ncatted -O -h -a units,lat,o,c,degree_north ~/Downloads/161122_RobertPincus_multiple_input4MIPs_radiation_RFMIP_UColorado-RFMIP-20161122_none.nc
durack1 commented 7 years ago

@dnadeau4 the issue wasn't incorrect units, these were changed to prevent the CF checker (which uses a much earlier version of "chat-lite") from crashing.. I confirmed the same behaviour in the 2.8 cdms release and hence this issue

durack1 commented 7 years ago

@dnadeau4 if you see the original ncdump output, these are the same as your ncatted edits above

dnadeau4 commented 7 years ago

@durack1 I was able to work with these files when I changed the units. Why is longitudes degrees_north in the original ncdump?

dnadeau4 commented 7 years ago

degree is an acceptable unit in CF-1 conventions. When you set degree north I assumed CDMS used the first word degree. But if you set longitude to degree_north the command isLatitude returns true. That is why you get the error that lat is equal to lon.

dnadeau4 commented 7 years ago

longitude needs to be set to degree_east

sashakames commented 7 years ago

Once ESGF has CDMS imported via a CONDA python, we will have this and any other cdms2 features. Targeting the next major release --we have one this/next week but too soon for inclusion now.

-Sasha

From: "Paul J. Durack" notifications@github.com Reply-To: UV-CDAT/cdms reply@reply.github.com Date: Sunday, January 15, 2017 at 1:38 PM To: UV-CDAT/cdms cdms@noreply.github.com Cc: Sasha Ames sasha@llnl.gov, Mention mention@noreply.github.com Subject: Re: [UV-CDAT/cdms] File open fails when lon and lat are variables (not dimensions) (#32)

@sashakameshttps://github.com/sashakames with #76https://github.com/UV-CDAT/cdms/pull/76 it seems that @dnadeau4https://github.com/dnadeau4 has solved this issue and so this version of bug-fixed cdms will need to be pulled across so the ESGF publisher can open and publish the file (attached above). Once this PR is merged, can we prioritise this bug fix patch into ESGF?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/UV-CDAT/cdms/issues/32#issuecomment-272725824, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHf2-5HQJux9qSIMouYeGhyF2_bN3KO_ks5rSpHggaJpZM4K5v2A.

durack1 commented 7 years ago

dnadeau4 ok now I get you.. So the unit values were flipped lon:units = "degree_north" ; should have been lon:units = "degree_east" ; and likewise for lat.. I still think the software should fix (or at the very least point out) dumb user errors like this.. So checking standard_name or longname for longitude or latitude string matches rather than bombing like it did and not specifically pointing out that the units for the dimensions (longitude and latitude) are incompatible with the software (and conventions)

I wonder if there would be a way, in the case of a failure testing the units against the possible values and returning a "units invalid" error, rather than ungracefully exiting as it does. The fact that cdms can't read this file (due to the units error) is kinda worrying to me..

durack1 commented 7 years ago

@dnadeau4 you might want to create a user error tag, and assign it here.. I know @doutriaux1 likes that label alot

doutriaux1 commented 7 years ago

@dnadeau4 I think @durack1 is right here, can we catch the error and return a user understandable error here. Thanks.

doutriaux1 commented 7 years ago

@durack1 see, as I 'm getting older I become weaker (and wiser?) and I'm willing to listen to users even when they are caught in a flagrant user error

jypeter commented 7 years ago

That's good @doutriaux1, because our new users not only do not read the docs, but they tend to assume that things will work as they expect (regardless of what the functions are actually supposed to do). They must believe in magic and fairy tales

durack1 commented 7 years ago

@doutriaux1 that robust tag may indeed get awarded one day.. My philosophy here is to fix what you can irrespective of how trivial the error is.. The fact that the units were flipped avoided the eyes of a number of folks before @dnadeau4 correctly pointed the issue out.. I think the software should have pointed this out first and suggested (or automagically implemented) a fix.

doutriaux1 commented 7 years ago

@durack1 agreed the software should point this out to the user, I am personally STRONGLY opposed to to fixing things automatically for the user. I refuse to assume that I am smarter than the user, there might be a good reason for this that we don't know of. All we can do is let the user know that we can't go forward with the data the way it is.

dnadeau4 commented 7 years ago

I guess I could try to use "standard_name" if "units" is not right for variables lat/lon and warn the user.

If there is an attribute axis='Y' then we are done! To pass the test, the variable must start wit 'lat' (mixed cases are fine).

    def isLatitude(self):                                                                                                                                    
        id = self.id.strip().lower()                                                                                                                         
        if (hasattr(self,'axis') and self.axis=='Y'): return 1                                                                                               
        units = getattr(self,"units","").strip().lower()                                                                                                     
        if units in ["degrees_north","degree_north","degree_n","degrees_n","degreen","degreesn"]:                                                            
          return 1                                                                                                                                           
        return (id[0:3] == 'lat') or (id in latitude_aliases)                                                                                                
durack1 commented 7 years ago

@dnadeau4 there is always the range check approach too, so for a valid latitude this can only occupy the range between -90.0 and 90.0, and for longitude this can only be -180.0 to 360.0 any other values are an invalid variable

The preferred behaviour to me would be for these CF variables and attributes to be set if they can be found (the current searches are fine), however if anything fails, rather than aborting the read, throw a descriptive error and return the file handle. This way a user is still able to operate on the file, but won't have much (or any?) of the cdms transientVariable magical stuff to work with. In the test case above this would allow me for example to work with the file and its contents, and what I would do is then read out all the information/variables from the file into memory, and then use cdms to rewrite the file correctly

doutriaux1 commented 7 years ago

range depends on the units though. But it's probably worth adding for known units.

doutriaux1 commented 7 years ago

we can't enforce a valid range for longitudes though

durack1 commented 7 years ago

@doutriaux1 why no valid range for longitudes? These values have to sit within the range -180 to 360 degrees. That deals with the -180 to 180 grids and the 0 to 360 grids too, what other ranges are required?

dnadeau4 commented 7 years ago

@doutriaux1 Can CDMS read other projections like Mercator? If so, I cannot check the range the @durack1 asked.

doutriaux1 commented 7 years ago

@dnadeau4 as i stated range depends on units, I think mercator and such will have units in m or something like that. So you could check range based on units degrees_north or radians or such