CDAT / cdms

9 stars 10 forks source link

Can't read CF-1.5 compliant GPCP data ... #105

Open gleckler1 opened 7 years ago

gleckler1 commented 7 years ago

Data can be retrieved via wget from the following site: http://eagle1.umd.edu/GPCP_ICDR/GPCP_Monthly.html

The issue appears to be the dimension of the lat and lon bounds, e.g., lat_bounds(nlat, 2)

import cdms2

f = cdms2.open('gpcp_cdr_v23rB1_y2016_m06.nc') Traceback (most recent call last): File "", line 1, in File "/export/gleckler1/anaconda2/envs/pmp_012617a/lib/python2.7/site-packages/cdms2/dataset.py", line 359, in openDataset file1 = CdmsFile(path, "r") File "/export/gleckler1/anaconda2/envs/pmp_012617a/lib/python2.7/site-packages/cdms2/dataset.py", line 1197, in init grid = FileGenericGrid(lat, lon, gridname, parent=self, maskvar=maskvar) File "/export/gleckler1/anaconda2/envs/pmp_012617a/lib/python2.7/site-packages/cdms2/gengrid.py", line 305, in init AbstractGenericGrid.init(self, latAxis, lonAxis, id, maskvar, tempmask, node) File "/export/gleckler1/anaconda2/envs/pmp_012617a/lib/python2.7/site-packages/cdms2/gengrid.py", line 22, in init raise CDMSError, 'Latitude and longitude axes must have the same shape.' cdms2.error.CDMSError: Latitude and longitude axes must have the same shape.

durack1 commented 7 years ago
ncdump -h /clim_obs/orig/data/GPCP/v2.3/gpcp_cdr_v23rB1_y2007_m10.nc 
netcdf gpcp_cdr_v23rB1_y2007_m10 {                                                                                           
dimensions:                                                                                                                  
        nlat = 72 ;                                                                                                          
        nlon = 144 ;                                                                                                         
        time = 1 ;                                                                                                           
        nv = 2 ;                                                                                                             
variables:                                                                                                                   
        float latitude(nlat) ;                                                                                               
                latitude:long_name = "Latitude" ;                                                                            
                latitude:standard_name = "latitude" ;                                                                        
                latitude:units = "degrees_north" ;                                                                           
                latitude:valid_range = -90.f, 90.f ;                                                                         
                latitude:missing_value = -9999.f ;                                                                           
                latitude:bounds = "lat_bounds" ;                                                                             
        float longitude(nlon) ;                                                                                              
                longitude:long_name = "Longitude" ;                                                                          
                longitude:standard_name = "longitude" ;                                                                      
                longitude:units = "degrees_east" ;                                                                           
                longitude:valid_range = 0.f, 360.f ;                                                                         
                longitude:missing_value = -9999.f ;                                                                          
                longitude:bounds = "lon_bounds" ;                                                                            
        float time(time) ;                                                                                                   
                time:long_name = "time" ;                                                                                    
                time:standard_name = "time" ;                                                                                
                time:units = "days since 1970-01-01 00:00:00 0:00" ;                                                         
                time:calendar = "julian" ;                                                                                   
                time:axis = "T" ;                                                                                            
        float lat_bounds(nlat, nv) ;                                                                                         
                lat_bounds:units = "degrees_north" ;                                                                         
                lat_bounds:comment = "latitude values at the north and south bounds of each pixel." ;                        
        float lon_bounds(nlon, nv) ;                                                                                         
                lon_bounds:units = "degrees_east" ;                                                                          
                lon_bounds:comment = "longitude values at the west and east bounds of each pixel." ;                         
        float time_bounds(nv, time) ;                                                                                        
                time_bounds:units = "days since 1970-01-01 00:00:00 0:00" ;                                                  
                time_bounds:comment = "time bounds for each time value" ;                                                    
        float precip(nlat, nlon) ;                                                                                           
                precip:long_name = "NOAA Climate Data Record (CDR) of GPCP Satellite-Gauge Combined Precipitation" ;         
                precip:standard_name = "precipitation amount" ;                                                              
                precip:units = "millimeters/day" ;                                                                           
                precip:coordinates = "longitude latitude time" ;                                                             
                precip:valid_range = 0.f, 100.f ;                                                                            
                precip:cell_methods = "precip: mean" ;                                                                       
                precip:missing_value = -9999.f ;                                                                             
        float precip_error(nlat, nlon) ;                                                                                     
                precip_error:long_name = "NOAA CDR of GPCP Satellite-Gauge Combined Precipitation Error" ;                   
                precip_error:units = "millimeters/day" ;
                precip_error:coordinates = "longitude latitude" ;
                precip_error:valid_range = 0.f, 100.f ;
                precip_error:missing_value = -9999.f ;

// global attributes:
                :Conventions = "CF-1.6, ACDD 1.3" ;
                :title = "NOAA Climate Data Record (CDR) of Satellite-Gauge Precipitation from the Global Precipitation Climatatology Project (GPCP), V2.3" ;
                :source = "oc.200710.sg" ;
                :references = "Huffman et al. 1997, http://dx.doi.org/10.1175/1520-0477(1997)078<0005:TGPCPG>2.0.CO;2; Adler et al. 2003, http://dx.doi.org/10.1175/1525-7541(2003)004<1147:TVGPCP>2.0.CO;2; Huffman et al. 2009, http://dx.doi.org/10.1029/2009GL040000; Adler et al. 2016, Global Precipitation Climatology Project (GPCP) Monthly Analysis: Climate Algorithm Theoretical Basis Document (C-ATBD)" ;
                :history = "1) 2016-07-12T11:25:15Z, Dr. Jian-Jian Wang, U of Maryland, Created beta (B1) file" ;
                :Metadata_Conventions = "CF-1.5, Unidata Dataset Discovery v1.0, NOAA CDR v1.0, GDS v2.0" ;
                :standard_name_vocabulary = "CF Standard Name Table (v31, 08 March 2016)" ;
                :id = "gpcp_cdr_v23rB1_y2007_m10.nc" ;
                :naming_authority = "gov.noaa.ncdc" ;
                :date_created = "2016-07-12T11:25:15Z" ;
                :license = "No constraints on data access or use." ;
                :summary = "Global Precipitation Climatology Project (GPCP) Version 2.3 gridded, merged satellite/gauge precipitation Climate data Record (CDR) with errors from 1979 to present." ;
                :keywords = "EARTH SCIENCE > ATMOSPHERE > PRECIPITATION > PRECIPITATION AMOUNT" ;
                :keywords_vocabulary = "NASA Global Change Master Directory (GCMD) Earth Science Keywords, Version 7.0" ;
                :cdm_data_type = "Grid" ;
                :project = "GPCP > Global Precipitation Climatology Project" ;
                :processing_level = "NASA Level 3" ;
                :creator_name = "Dr. Jian-Jian Wang" ;
                :creator_email = "jjwang@umd.edu" ;
                :institution = "ACADEMIC > UMD/ESSIC > Earth System Science Interdisciplinary Center, University of Maryland" ;
                :publisher_name = "NOAA National Centers for Environmental Information (NCEI)" ;
                :publisher_email = "jjwang@umd.edu" ;
                :publisher_url = "https://www.ncei.noaa.gov" ;
                :geospatial_lat_min = "-88.75" ;
                :geospatial_lat_max = "88.75" ;
                :geospatial_lat_units = "degrees_north" ;
                :geospatial_lat_resolution = "2.5 degrees" ;
                :geospatial_lon_min = "1.25" ;
                :geospatial_lon_max = "358.75" ;
                :geospatial_lon_units = "degrees_east" ;
                :geospatial_lon_resolution = "2.5 degrees" ;
                :time_coverage_start = "2007-10-01T00:00:00Z" ;
                :time_coverage_end = "2007-10-31T23:59:59Z" ;
                :time_coverage_duration = "P1M" ;
                :contributor_name = "Robert Adler, George Huffman, Mathew Sapiano, Jian-Jian Wang" ;
                :contributor_role = "principalInvestigator, principalInvestigator, processor and custodian" ;
                :acknowledgment = "This project was supported in part by a grant from the NOAA Climate Data Record (CDR) Program for satellites." ;
                :cdr_program = "NOAA Climate Data Record Program for satellites, FY 2011." ;
                :cdr_variable = "precipitation" ;
                :metadata_link = "gov.noaa.ncdc:XXXXX" ;
                :product_version = "v23rB1" ;
                :platform = "GOES (Geostationary Operational Environmental Satellite), GMS (Japan Geostationary Meteorological Satellite), METEOSAT, Earth Observing System, AQUA, DMSP (Defense Meteorological Satellite Program)" ;
                :sensor = "Imager, Imager, Imager, AIRS > Atmospheric Infrared Sounder, SSMI > Special Sensor Microwave/Imager" ;
                :spatial_resolution = "2.5 degree" ;
                :comment = "Processing computer: eagle2.umd.edu" ;
}
dnadeau4 commented 7 years ago

I was able to make this work but It broke a lot of other tests especially curvilinear.

in precip, coordinates points to lon lat time and precip is nlat,nlon only.

It would have been much simpler for us if the they had lat(lat) and lon(lon) like everybody else.

        float precip(nlat, nlon) ;                                                                                           
                precip:long_name = "NOAA Climate Data Record (CDR) of GPCP Satellite-Gauge Combined Precipitation" ;         
                precip:standard_name = "precipitation amount" ;                                                              
                precip:units = "millimeters/day" ;                                                                           
                precip:coordinates = "longitude latitude time" ;                                                             
                precip:valid_range = 0.f, 100.f ;                                                                            
                precip:cell_methods = "precip: mean" ;                                                                       
                precip:missing_value = -9999.f ;           
durack1 commented 7 years ago

@dnadeau4 the standard_name "precipitation amount" is missing an underscore, which makes me think that the file may not be CF-compliant after all, even if i could be opened to check..

dnadeau4 commented 7 years ago

Talking to @taylor13 the file is not CF-1 compliant. But Charles and I agree that CDMS should be able to retrieve axes and create a grid from the coordinates attribute.

taylor13 commented 7 years ago

Actually, the file doesn't even follow the netCDF best practices (independent of CF) that you should "Make coordinate variables for every dimension possible (except for string length dimensions)." See https://www.unidata.ucar.edu/software/netcdf/docs/BestPractices.html . If you want to access the data array in netCDF using the coordinate values (rather than by index), the dimension must be a "coordinate variable", which means it must be of the form x(x) (e.g., not longitude(nlon), but longitude(longitude). I think CDMS is built on the notion that data should be accessible by providing the dimension values (and not just the indices), then all dimensions should be defined as coordinate variables. In summary, although core dumping shouldn't be the result of CDMS trying to read this file, I think it should error exit (gracefully), and not just give a warning. The user should rewrite the file with the longitude dimension defined as longitude(longitude) and the latitude dimension defined as latitude(latitude) in accordance with best practices and so that CDMS won't get tripped up later if the user tries to extract data using coordinates.

durack1 commented 7 years ago

@dnadeau4 my thoughts with this.. cdms should be able to read ANYTHING, and so if any of the coord or dimension stuff fails.. Then you get a warning about this failure and rather than the complete "dressed" TransientVariable, you get aaccess to the numpy array (or TransientVariable) with the coords and attributes unset - so they all are none or equivalent types..

taylor13 commented 7 years ago

yes, that would probably be fine, but it might require major modification of CDMS ... is the effort worth it? That's up to Charles and Dean, since they're paying for it, and Denis, since he's already oversubscribed.

durack1 commented 7 years ago

@doutriaux1 pinging you here

doutriaux1 commented 7 years ago

@dnadeau4 and I talked about it at length, I trust @dnadeau4 to do what's best considering effort/value/time

taylor13 commented 7 years ago

I think that CDMS was built on the assumption that the longitude and latitude values would be stored as true netCDF coordinate variables (e.g., lon(lon), not lon(nlon)), so it may be difficult to anticipate how extensive the revisions will be if CDMS is to handle this type of structure. I never heard of this error being encountered before (in over a decade), so that's why I don't think it should be high priority to generalize CDMS. I agree, however, rather than core dumping, CDMS should error exit and point out why the file cannot be read.

durack1 commented 7 years ago

@taylor13 I think the inverse, that if it fails to "dress" a TransientVariable with correct coords and attributes, it falls back to standard numpy so the matrix without all the dressings, or better a TransientVariable with null/none values for all coords/attributes that cannot be determined by the code

taylor13 commented 7 years ago

But aren't there CDMS functions that assume that the variables are "dressed"? How will they respond if an undressed variable is given to them? Won't all these functions then have to include error responses in such cases?

durack1 commented 7 years ago

The issue here is that the cdms library fails when trying to open a valid (but not CF compliant) file. Any netcdf file should be able to be opened, and any valid netcdf variable in a complete (uncorrupted) netcdf file should be able to be read. This is the issue being discussed at the moment. If the library has trouble "dressing" the numpy array because of problems with the file, a warning should be thrown rather than an inability of doing anything with the file and the data.

What you do after you have a variable in memory (and what tools you use) is up to the user..

doutriaux1 commented 7 years ago

Ok I think @durack1 is onto something here. @dnadeau4 rather than failing we could just return a basic TV. As @taylor13 mentioned most code assumning metadata will fail, but that's ok as @taylor13 mentioned the file is poorly formatted anyway and should be rewritten. That way we win on all fronts. The user can still keep working on the file (s)he doesn't need to wait for another version of cdms2 but at the same time the experience will be limited enciting the user to do the right thing and rewrite the data.

gleckler1 commented 7 years ago

I like PD's compromise. If the data could be read into memory with a warning it could be possible to fix things (dress it properly) within CDAT rather than having to rely on something else to first rewrite it. BTW, this is a very popular dataset... we should contact them and explain their data is not CF compliant before their habits are copied.

durack1 commented 7 years ago

@doutriaux1 exactly, I am a user and I try and write CF compliant and well described files.. But this particular issue was due to @gleckler1 trying to read a file that someone else had produced, and was unable to even open the file using cdms. The only alternative in this case is to use another I/O library altogether, which is really something that you want to minimize at all costs.

I think a warning message should be sent to the screen in such a case, to at the minimum point out to a user that further manipulation of the TV is not going to be seamless.. It would also be great if the other TV functions would also throw a warning if the missing metadata caused problems with the computation, but that is of tertiary importance to successfully opening and reading a file

dnadeau4 commented 7 years ago

If coordinates attribute is set, I can try to retrieve the lon/lat and rename them as the dimension name. This way the user will have a dressed transient variable. This should not be too difficult.

If the coordinates attribute doe not exist, than I need to change the Rectangular grid function to create an generic rectangular grid. if lat/lon rank is 1D. That should do the trick.

I would add a lot of warnings to make sure users are aware of what happened. Then they will be able to run cfchecker and figure out what to change.

dnadeau4 commented 7 years ago

@gleckler1 do you want to contact them or should I?

taylor13 commented 7 years ago

I'm o.k. with returning an undressed transient variable, but I don't think you should do anything to try to correct it. We want the person who created the file to make it CF compliant (and adhere to the netCDF best practices). The file is not compliant with CF, so we shouldn't try to "fix" it to become CF compliant. We should, of course, tell the user we encountered a problem. We should tell them as a warnings

CDMS should not make any attempt to create grids, identify longitude and latitude, etc. from the file. If you do this, you are enabling folks to get way too sloppy, and that defeats the purpose of a convention.

Note that if the "coordinates" attribute had been left out of the above example, CDMS should probably respond in the same way. Does anyone know what would actually happen in that case? Would it core dump or be happy since the "coordinates" attribute was missing?

taylor13 commented 7 years ago

@dnadeau4 @durack1 @doutriaux1 @gleckler1 -- forgot to ping you all re the above comment.

gleckler1 commented 7 years ago

I think an undressed TV is sufficient if we can also retrieve the axes seperately. It may be work to put the clothes back on, but better if we can do that within our tools. I do think we should contact the dataset curator... this might have most weight if it comes from KET unless he prefers PG to do it. It would be good to cc some of the key players, at a minimum Alder and Huffman. From http://eagle1.umd.edu/GPCP_ICDR/GPCP_Background.html, the dataset curator is:

Dr. Jian-Jian Wang ESSIC, University of Maryland College Park College Park, MD 20742 USA Phone: +1 301-405-4887 jjwang@umd.edu

durack1 commented 7 years ago

I agree completely with @gleckler1, the tool should be able to read anything - or at least provide a user with an ability to read any properly formed (so not download garbled) netcdf file, once this is opened then a user should be able to then access any of the file variables, so read a variable called myfavoritelon rather than the well-named and currently expected variable lon.

Responding to @taylor13, I personally think that redressing a variable is not a priority, as that would be near impossible to anticipate all cases.. Reading all files, and all variables is the key motivation here

taylor13 commented 7 years ago

An important consideration for CDMS development is whether we expect it to continue to be used as part of the official CF checker (developed elsewhere). If so, we will need CDMS to reflect (and not alter) what's in the file. If we infer information from file metadata in ways not mandated by the CF conventions, and then pass the structure to the checker, the checker will assume we got the information through CF rules and find the file in conformance. This would be wrong.

Perhaps Denis' error message for the case considered here could be passed along to the checker to make it clear to the checker that the file has been adulterated.

dnadeau4 commented 7 years ago

Here are my 2 cents.

This was the old CDMS output.

Traceback (most recent call last):
  File "glecker.py", line 8, in <module>
    f=cdms2.open("gpcp_cdr_v23rB1_y2016_m08.nc")
  File "/software/anaconda2/envs/cmor3/lib/python2.7/site-packages/cdms2/dataset.py", line 212, in openDataset
    file1 = CdmsFile(path,"r")
  File "/software/anaconda2/envs/cmor3/lib/python2.7/site-packages/cdms2/dataset.py", line 1016, in __init__
    grid = FileGenericGrid(lat, lon, gridname, parent=self, maskvar=maskvar)
  File "/software/anaconda2/envs/cmor3/lib/python2.7/site-packages/cdms2/gengrid.py", line 305, in __init__
    AbstractGenericGrid.__init__(self, latAxis, lonAxis, id, maskvar, tempmask, node)
  File "/software/anaconda2/envs/cmor3/lib/python2.7/site-packages/cdms2/gengrid.py", line 22, in __init__
    raise CDMSError, 'Latitude and longitude axes must have the same shape.'
cdms2.error.CDMSError: Latitude and longitude axes must have the same shape.

This is the new one which allows to use the file.

/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: 
** Variable "latitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlat(nlat)"
** Verify that your file is CF-1 compliant!
  warnings.warn(msg)
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: 
** Variable "longitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlon(nlon)"
** Verify that your file is CF-1 compliant!
  warnings.warn(msg)

But running the cfchecker.py on this file seems to introduce erroneous warning about missing_values in bounds.

 cfchecks -v1.6 gpcp_cdr_v23rB1_y2016_m08.nc

CHECKING NetCDF FILE: gpcp_cdr_v23rB1_y2016_m08.nc
=====================
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: 
** Variable "latitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlat(nlat)"
** Verify that your file is CF-1 compliant!
  warnings.warn(msg)
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: 
** Variable "longitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlon(nlon)"
** Verify that your file is CF-1 compliant!
  warnings.warn(msg)
Using CF Checker Version 2.0.9
Checking against CF Version CF-1.6
Using Standard Name Table Version 44 (2017-05-23T11:17:23Z)
Using Area Type Table Version 6 (22 February 2017)

------------------
Checking variable: lat_bounds
------------------
ERROR: Attribute missing_value of incorrect type
WARNING (7.1): Boundary Variable lat_bounds should not have missing_value attribute

------------------
Checking variable: time_bounds
------------------
WARNING (3): No standard_name or long_name attribute specified
ERROR: Attribute missing_value of incorrect type

------------------
Checking variable: time
------------------

------------------
Checking variable: longitude
------------------
WARNING: attribute missing_value attached to wrong kind of variable

------------------
Checking variable: precip
------------------
ERROR (3.3): Invalid standard_name: precipitation
ERROR (3.3): Invalid standard_name modifier: amount
ERROR (7.3): Invalid 'name' in cell_methods attribute: precip

------------------
Checking variable: lon_bounds
------------------
ERROR: Attribute missing_value of incorrect type
WARNING (7.1): Boundary Variable lon_bounds should not have missing_value attribute

------------------
Checking variable: latitude
------------------
WARNING: attribute missing_value attached to wrong kind of variable

------------------
Checking variable: precip_error
------------------

ERRORS detected: 5
WARNINGS given: 5
INFORMATION messages: 0
dnadeau4 commented 7 years ago

I deleted forcing missing_value for fvariable and was able to get the right report from cfcheckers.


CHECKING NetCDF FILE: gpcp_cdr_v23rB1_y2016_m08.nc
=====================
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: 
** Variable "latitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlat(nlat)"
** Verify that your file is CF-1 compliant!
  warnings.warn(msg)
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: 
** Variable "longitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlon(nlon)"
** Verify that your file is CF-1 compliant!
  warnings.warn(msg)
Using CF Checker Version 2.0.9
Checking against CF Version CF-1.6
Using Standard Name Table Version 44 (2017-05-23T11:17:23Z)
Using Area Type Table Version 6 (22 February 2017)

------------------
Checking variable: lat_bounds
------------------

------------------
Checking variable: time_bounds
------------------
WARNING (3): No standard_name or long_name attribute specified

------------------
Checking variable: time
------------------

------------------
Checking variable: longitude
------------------
WARNING: attribute missing_value attached to wrong kind of variable

------------------
Checking variable: precip
------------------
ERROR (3.3): Invalid standard_name: precipitation
ERROR (3.3): Invalid standard_name modifier: amount
ERROR (7.3): Invalid 'name' in cell_methods attribute: precip

------------------
Checking variable: lon_bounds
------------------

------------------
Checking variable: latitude
------------------
WARNING: attribute missing_value attached to wrong kind of variable

------------------
Checking variable: precip_error
------------------

ERRORS detected: 2
WARNINGS given: 3
INFORMATION messages: 0
gleckler1 commented 7 years ago

That looks much better and although I don't see a python prompt I gather that the undressed array is read into memory. Do I have that right?

While its great to verify if the data has CF compliant grid structure I don't think the standard name should be checked - when CDMS is being used for research we often save obscure diagnostics that don't have standard names.

Denis, did you mention previously that the CF checker is no longer using CDMS?


From: Denis Nadeau notifications@github.com Sent: Friday, May 26, 2017 3:17:56 PM To: UV-CDAT/cdms Cc: gleckler1; Mention Subject: Re: [UV-CDAT/cdms] Can't read CF-1.5 compliant GPCP data ... (#105)

Here are my 2 cents.

This was the old CDMS output.

Traceback (most recent call last): File "glecker.py", line 8, in f=cdms2.open("gpcp_cdr_v23rB1_y2016_m08.nc") File "/software/anaconda2/envs/cmor3/lib/python2.7/site-packages/cdms2/dataset.py", line 212, in openDataset file1 = CdmsFile(path,"r") File "/software/anaconda2/envs/cmor3/lib/python2.7/site-packages/cdms2/dataset.py", line 1016, in init grid = FileGenericGrid(lat, lon, gridname, parent=self, maskvar=maskvar) File "/software/anaconda2/envs/cmor3/lib/python2.7/site-packages/cdms2/gengrid.py", line 305, in init AbstractGenericGrid.init(self, latAxis, lonAxis, id, maskvar, tempmask, node) File "/software/anaconda2/envs/cmor3/lib/python2.7/site-packages/cdms2/gengrid.py", line 22, in init raise CDMSError, 'Latitude and longitude axes must have the same shape.' cdms2.error.CDMSError: Latitude and longitude axes must have the same shape.

This is the new one which allows to use the file.

/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: Variable "latitude" found in the "coordinates" attribute will be used as coordinate variable CDMS could not find the coordinate variable "nlat(nlat)" Verify that your file is CF-1 compliant! warnings.warn(msg) /software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: Variable "longitude" found in the "coordinates" attribute will be used as coordinate variable CDMS could not find the coordinate variable "nlon(nlon)" Verify that your file is CF-1 compliant! warnings.warn(msg)

But running the cfchecker.py on this file seems to introduce erroneous warning about missing_values in bounds.

cfchecks -v1.6 gpcp_cdr_v23rB1_y2016_m08.nc

CHECKING NetCDF FILE: gpcp_cdr_v23rB1_y2016_m08.nc

/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: Variable "latitude" found in the "coordinates" attribute will be used as coordinate variable CDMS could not find the coordinate variable "nlat(nlat)" Verify that your file is CF-1 compliant! warnings.warn(msg) /software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: Variable "longitude" found in the "coordinates" attribute will be used as coordinate variable CDMS could not find the coordinate variable "nlon(nlon)" Verify that your file is CF-1 compliant! warnings.warn(msg) Using CF Checker Version 2.0.9 Checking against CF Version CF-1.6 Using Standard Name Table Version 44 (2017-05-23T11:17:23Z) Using Area Type Table Version 6 (22 February 2017)


Checking variable: lat_bounds

ERROR: Attribute missing_value of incorrect type WARNING (7.1): Boundary Variable lat_bounds should not have missing_value attribute


Checking variable: time_bounds

WARNING (3): No standard_name or long_name attribute specified ERROR: Attribute missing_value of incorrect type


Checking variable: time


Checking variable: longitude

WARNING: attribute missing_value attached to wrong kind of variable


Checking variable: precip

ERROR (3.3): Invalid standard_name: precipitation ERROR (3.3): Invalid standard_name modifier: amount ERROR (7.3): Invalid 'name' in cell_methods attribute: precip


Checking variable: lon_bounds

ERROR: Attribute missing_value of incorrect type WARNING (7.1): Boundary Variable lon_bounds should not have missing_value attribute


Checking variable: latitude

WARNING: attribute missing_value attached to wrong kind of variable


Checking variable: precip_error

ERRORS detected: 5 WARNINGS given: 5 INFORMATION messages: 0

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/UV-CDAT/cdms/issues/105#issuecomment-304399380, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEV6rdvn1RfKigiljf5pGOnZAeSr8NCaks5r90-UgaJpZM4MqnWf.

dnadeau4 commented 7 years ago

CF checker is using CDMS, and the array is read by cfchecks into memory.

time_bounds variable was not attached to attribute time:bounds, this is why cfchecks says that standard_name is missing.

So we found another problem with that file, but I at least CDMS was able to open it.

durack1 commented 7 years ago

@dnadeau4 what is the output of f.variable (Or is it variables?) after the file is opened with the warnings?

dnadeau4 commented 7 years ago
>>> import cdms2
>>> f=cdms2.open("gpcp_cdr_v23rB1_y2016_m08.nc")
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: 
** Variable "latitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlat(nlat)"
** Verify that your file is CF-1 compliant!
  warnings.warn(msg)
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: 
** Variable "longitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlon(nlon)"
** Verify that your file is CF-1 compliant!
  warnings.warn(msg)

>>> f.listvariables()
['lat_bounds', 'nlat', 'time_bounds', 'longitude', 'precip', 'lon_bounds', 'latitude', 'nlon', 'precip_error']
dnadeau4 commented 7 years ago

In this case,

nlat == longitude
nlon == latitude
jypeter commented 7 years ago

I have just read this long thread, and I agree that it would be nice if cdms2 could at least return a numpy array when reading problematic datasets, so that I don't have to read the netCDF4 documentation to read these. Some best effort thing and enough warning sounds good!

Now, if the problematic axes (or whatever passes for axes) don't come attached with the problematic variables, will I also be able to also read these "axes" as variables (ie get the axes' values in memory)? I have a vague memory of having trouble to read axes independently of variables

Also, this mail mentioning the CF convention has reminded me of problems I have sometimes had with packed data. Details in #140

dnadeau4 commented 7 years ago

I created some transientAxis for such a case. The solution is simple, just took me a while to understand the architecture.

import cdms2
f=cdms2.open("gpcp_cdr_v23rB1_y2016_m08.nc")
data=f['precip']
data.getAxis(0)[:]
array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,
        11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,
        22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.,
        33.,  34.,  35.,  36.,  37.,  38.,  39.,  40.,  41.,  42.,  43.,
        44.,  45.,  46.,  47.,  48.,  49.,  50.,  51.,  52.,  53.,  54.,
        55.,  56.,  57.,  58.,  59.,  60.,  61.,  62.,  63.,  64.,  65.,
        66.,  67.,  68.,  69.,  70.,  71.], dtype=float32)
 data.getAxis(1)[:]
array([   0.,    1.,    2.,    3.,    4.,    5.,    6.,    7.,    8.,
          9.,   10.,   11.,   12.,   13.,   14.,   15.,   16.,   17.,
         18.,   19.,   20.,   21.,   22.,   23.,   24.,   25.,   26.,
         27.,   28.,   29.,   30.,   31.,   32.,   33.,   34.,   35.,
         36.,   37.,   38.,   39.,   40.,   41.,   42.,   43.,   44.,
         45.,   46.,   47.,   48.,   49.,   50.,   51.,   52.,   53.,
         54.,   55.,   56.,   57.,   58.,   59.,   60.,   61.,   62.,
         63.,   64.,   65.,   66.,   67.,   68.,   69.,   70.,   71.,
         72.,   73.,   74.,   75.,   76.,   77.,   78.,   79.,   80.,
         81.,   82.,   83.,   84.,   85.,   86.,   87.,   88.,   89.,
         90.,   91.,   92.,   93.,   94.,   95.,   96.,   97.,   98.,
         99.,  100.,  101.,  102.,  103.,  104.,  105.,  106.,  107.,
        108.,  109.,  110.,  111.,  112.,  113.,  114.,  115.,  116.,
        117.,  118.,  119.,  120.,  121.,  122.,  123.,  124.,  125.,
        126.,  127.,  128.,  129.,  130.,  131.,  132.,  133.,  134.,
        135.,  136.,  137.,  138.,  139.,  140.,  141.,  142.,  143.], dtype=float32)
dnadeau4 commented 7 years ago

screenshot from 2017-08-03 17-26-10

gleckler1 commented 7 years ago

BRAVO!

From: Denis Nadeau notifications@github.com Reply-To: UV-CDAT/cdms reply@reply.github.com Date: Thursday, August 3, 2017 at 6:24 PM To: UV-CDAT/cdms cdms@noreply.github.com Cc: gleckler1 pgleckler@llnl.gov, Mention mention@noreply.github.com Subject: Re: [UV-CDAT/cdms] Can't read CF-1.5 compliant GPCP data ... (#105)

I created some transientAxis for such a case. The solution is simple, just took me a while to understand the architecture.

import cdms2

f=cdms2.open("gpcp_cdr_v23rB1_y2016_m08.nc")

data=f['precip']

data.getAxis(0)[:]

array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.,

    11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,

    22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.,

    33.,  34.,  35.,  36.,  37.,  38.,  39.,  40.,  41.,  42.,  43.,

    44.,  45.,  46.,  47.,  48.,  49.,  50.,  51.,  52.,  53.,  54.,

    55.,  56.,  57.,  58.,  59.,  60.,  61.,  62.,  63.,  64.,  65.,

    66.,  67.,  68.,  69.,  70.,  71.], dtype=float32)

data.getAxis(1)[:]

array([ 0., 1., 2., 3., 4., 5., 6., 7., 8.,

      9.,   10.,   11.,   12.,   13.,   14.,   15.,   16.,   17.,

     18.,   19.,   20.,   21.,   22.,   23.,   24.,   25.,   26.,

     27.,   28.,   29.,   30.,   31.,   32.,   33.,   34.,   35.,

     36.,   37.,   38.,   39.,   40.,   41.,   42.,   43.,   44.,

     45.,   46.,   47.,   48.,   49.,   50.,   51.,   52.,   53.,

     54.,   55.,   56.,   57.,   58.,   59.,   60.,   61.,   62.,

     63.,   64.,   65.,   66.,   67.,   68.,   69.,   70.,   71.,

     72.,   73.,   74.,   75.,   76.,   77.,   78.,   79.,   80.,

     81.,   82.,   83.,   84.,   85.,   86.,   87.,   88.,   89.,

     90.,   91.,   92.,   93.,   94.,   95.,   96.,   97.,   98.,

     99.,  100.,  101.,  102.,  103.,  104.,  105.,  106.,  107.,

    108.,  109.,  110.,  111.,  112.,  113.,  114.,  115.,  116.,

    117.,  118.,  119.,  120.,  121.,  122.,  123.,  124.,  125.,

    126.,  127.,  128.,  129.,  130.,  131.,  132.,  133.,  134.,

    135.,  136.,  137.,  138.,  139.,  140.,  141.,  142.,  143.], dtype=float32)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/UV-CDAT/cdms/issues/105#issuecomment-320124066, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEV6raDL4St-nh2nt0uD39JR4-BHw0fKks5sUmSNgaJpZM4MqnWf.

taylor13 commented 7 years ago

I have just reviewed this issue and while you can now read the data (and even plot it -- very nice), I don't think the error messages are sufficiently explicit. Here are my comments/questions:

1) CDMS should not automatically attach a "missing_value" attribute to any variable, and as you found out, this will produce an error in the case of coordinate variables.

2) The CF checker should (and does) throw a warning message when both the standard name and the long name are missing (because although neither is required by CF at least one of them is "strongly recommended". For "obscure" variables that haven't yet been assigned a standard_name, you should define the long_name; then you won't get a warning from the CF checker. In the case of precip, of course, the standard_name exists and in the file was just incorrect, as the CF checker pointed out.

3) As some of you may have noticed, besides the problem with the coordinate attributes not being CF compliant, there are other problems, including

4) I don't think the error message you provide is strong enough. It says to "** Verify that your file is CF-1 compliant!". Then you run the Checker, and it doesn't say anything about the coordinates being non-compliant, so as a user, I would assume that this is not a problem. We want the user to know when their files are non-compliant by running the checker, but the checker doesn't raise an error because you fixed the problem after you read the data in. Your error message should say something more explicit like:

** Your file does not conform to CF standards or even the netCDF best practices.
  CDMS has corrected certain problems, but you should rewrite your file to make
it CF-compliant by correcting the problem with your coordinates.  Note that 
since your file's data structures have been altered by CDMS, the CF checker 
cannot be trusted to expose all the errors in your file.

5) The result of your fix is: >>> f.listvariables() ['lat_bounds', 'nlat', 'time_bounds', 'longitude', 'precip', 'lon_bounds', 'latitude', 'nlon', 'precip_error']. I presume scalars like "nv" are omitted from this list. In the original, nlon and nlat are scalars, but you've made them into coordinate variables, so you've clearly altered the structure of the file. That's why the CF checker doesn't throw an error about the coordinates.

dnadeau4 commented 7 years ago

@taylor13

  1. I don't assign missing_values automatically and I am slowly getting ready for the wrath of Ken.
  2. The CF-checker abandoned CDMS, but the old version still works. I don't think I should defined a long_name, I should not mess up with what is in the file. This is the reason why we remove the use of the coordinates attribute to regenerate the grid. You said it could make the user sloppy.
  3. Nothing for CDMS.
  4. This warning does not exist anymore. Now CDMS can create a Rectangular grid from the axis nlat, nlon. You can now manipulate this file with CDMS, which was the goal.
  5. This is the new list:
    
    f.listvariables() 
     ['lat_bounds', 'time_bounds', 'longitude', 'precip', 'lon_bounds', 'latitude', 'precip_error']
    f.listdimension()
    ['nlat', 'nlon', 'nv', 'time']
taylor13 commented 7 years ago

I don't think you should be correcting their files either. I was just pointing out all the things wrong with the original file.

concerning 4 ... What I was trying to say is that you should provide a very strong warning. Getting rid of the warning is not the answer. We want to encourage them to correct the file, not have CDMS correct it. I'm fine with CDMS being able to read it now, but CDMS should point out that they were very sloppy in creating their file, and that you had to generate coordinate axes for them in order to read and plot the data.

If no one is using the old CF checker, then this is less important, but if some folks still use it to check their files, then CDMS must tell them when their files our out of compliance. You have hidden the problems with the files from the checker.

If we don't want CDMS used as a checker, we should make sure no one can download the CF-checker version that is based on CDMS.

taylor13 commented 7 years ago

Just to reiterate ... As I understand it, PrePARE uses CDMS to read files destined for the CMIP6 archive and in one of it's "checks", PrePARE runs the "old" version of the CF-Checker. If CDMS alters the data structures or "decorates" variables it finds in the files before performing its checks, then it won't be really evaluating the original file for conformance, but what CDMS generates from that file. We must therefore be sure that if CDMS makes any changes to the data (like correcting the coordinate information), then the user must be made informed (with an explicit warning) that the CF checker could be fooled into thinking the file is CF-compliant. That is why I suggested in item 4 of https://github.com/UV-CDAT/cdms/issues/105#issuecomment-320301200 that an explicit warning be raised.

durack1 commented 7 years ago

Just an FYI, the CF-checker that many folks use is accessible online at http://puma.nerc.ac.uk/cgi-bin/cf-checker.pl - that version is using an older version of CDMS2. They have rewritten the checker to use netcdf4-python and that version (linked off the page above) is found at http://pumatest.nerc.ac.uk/cgi-bin/cf-checker-dev.pl