Open gleckler1 opened 7 years ago
ncdump -h /clim_obs/orig/data/GPCP/v2.3/gpcp_cdr_v23rB1_y2007_m10.nc
netcdf gpcp_cdr_v23rB1_y2007_m10 {
dimensions:
nlat = 72 ;
nlon = 144 ;
time = 1 ;
nv = 2 ;
variables:
float latitude(nlat) ;
latitude:long_name = "Latitude" ;
latitude:standard_name = "latitude" ;
latitude:units = "degrees_north" ;
latitude:valid_range = -90.f, 90.f ;
latitude:missing_value = -9999.f ;
latitude:bounds = "lat_bounds" ;
float longitude(nlon) ;
longitude:long_name = "Longitude" ;
longitude:standard_name = "longitude" ;
longitude:units = "degrees_east" ;
longitude:valid_range = 0.f, 360.f ;
longitude:missing_value = -9999.f ;
longitude:bounds = "lon_bounds" ;
float time(time) ;
time:long_name = "time" ;
time:standard_name = "time" ;
time:units = "days since 1970-01-01 00:00:00 0:00" ;
time:calendar = "julian" ;
time:axis = "T" ;
float lat_bounds(nlat, nv) ;
lat_bounds:units = "degrees_north" ;
lat_bounds:comment = "latitude values at the north and south bounds of each pixel." ;
float lon_bounds(nlon, nv) ;
lon_bounds:units = "degrees_east" ;
lon_bounds:comment = "longitude values at the west and east bounds of each pixel." ;
float time_bounds(nv, time) ;
time_bounds:units = "days since 1970-01-01 00:00:00 0:00" ;
time_bounds:comment = "time bounds for each time value" ;
float precip(nlat, nlon) ;
precip:long_name = "NOAA Climate Data Record (CDR) of GPCP Satellite-Gauge Combined Precipitation" ;
precip:standard_name = "precipitation amount" ;
precip:units = "millimeters/day" ;
precip:coordinates = "longitude latitude time" ;
precip:valid_range = 0.f, 100.f ;
precip:cell_methods = "precip: mean" ;
precip:missing_value = -9999.f ;
float precip_error(nlat, nlon) ;
precip_error:long_name = "NOAA CDR of GPCP Satellite-Gauge Combined Precipitation Error" ;
precip_error:units = "millimeters/day" ;
precip_error:coordinates = "longitude latitude" ;
precip_error:valid_range = 0.f, 100.f ;
precip_error:missing_value = -9999.f ;
// global attributes:
:Conventions = "CF-1.6, ACDD 1.3" ;
:title = "NOAA Climate Data Record (CDR) of Satellite-Gauge Precipitation from the Global Precipitation Climatatology Project (GPCP), V2.3" ;
:source = "oc.200710.sg" ;
:references = "Huffman et al. 1997, http://dx.doi.org/10.1175/1520-0477(1997)078<0005:TGPCPG>2.0.CO;2; Adler et al. 2003, http://dx.doi.org/10.1175/1525-7541(2003)004<1147:TVGPCP>2.0.CO;2; Huffman et al. 2009, http://dx.doi.org/10.1029/2009GL040000; Adler et al. 2016, Global Precipitation Climatology Project (GPCP) Monthly Analysis: Climate Algorithm Theoretical Basis Document (C-ATBD)" ;
:history = "1) 2016-07-12T11:25:15Z, Dr. Jian-Jian Wang, U of Maryland, Created beta (B1) file" ;
:Metadata_Conventions = "CF-1.5, Unidata Dataset Discovery v1.0, NOAA CDR v1.0, GDS v2.0" ;
:standard_name_vocabulary = "CF Standard Name Table (v31, 08 March 2016)" ;
:id = "gpcp_cdr_v23rB1_y2007_m10.nc" ;
:naming_authority = "gov.noaa.ncdc" ;
:date_created = "2016-07-12T11:25:15Z" ;
:license = "No constraints on data access or use." ;
:summary = "Global Precipitation Climatology Project (GPCP) Version 2.3 gridded, merged satellite/gauge precipitation Climate data Record (CDR) with errors from 1979 to present." ;
:keywords = "EARTH SCIENCE > ATMOSPHERE > PRECIPITATION > PRECIPITATION AMOUNT" ;
:keywords_vocabulary = "NASA Global Change Master Directory (GCMD) Earth Science Keywords, Version 7.0" ;
:cdm_data_type = "Grid" ;
:project = "GPCP > Global Precipitation Climatology Project" ;
:processing_level = "NASA Level 3" ;
:creator_name = "Dr. Jian-Jian Wang" ;
:creator_email = "jjwang@umd.edu" ;
:institution = "ACADEMIC > UMD/ESSIC > Earth System Science Interdisciplinary Center, University of Maryland" ;
:publisher_name = "NOAA National Centers for Environmental Information (NCEI)" ;
:publisher_email = "jjwang@umd.edu" ;
:publisher_url = "https://www.ncei.noaa.gov" ;
:geospatial_lat_min = "-88.75" ;
:geospatial_lat_max = "88.75" ;
:geospatial_lat_units = "degrees_north" ;
:geospatial_lat_resolution = "2.5 degrees" ;
:geospatial_lon_min = "1.25" ;
:geospatial_lon_max = "358.75" ;
:geospatial_lon_units = "degrees_east" ;
:geospatial_lon_resolution = "2.5 degrees" ;
:time_coverage_start = "2007-10-01T00:00:00Z" ;
:time_coverage_end = "2007-10-31T23:59:59Z" ;
:time_coverage_duration = "P1M" ;
:contributor_name = "Robert Adler, George Huffman, Mathew Sapiano, Jian-Jian Wang" ;
:contributor_role = "principalInvestigator, principalInvestigator, processor and custodian" ;
:acknowledgment = "This project was supported in part by a grant from the NOAA Climate Data Record (CDR) Program for satellites." ;
:cdr_program = "NOAA Climate Data Record Program for satellites, FY 2011." ;
:cdr_variable = "precipitation" ;
:metadata_link = "gov.noaa.ncdc:XXXXX" ;
:product_version = "v23rB1" ;
:platform = "GOES (Geostationary Operational Environmental Satellite), GMS (Japan Geostationary Meteorological Satellite), METEOSAT, Earth Observing System, AQUA, DMSP (Defense Meteorological Satellite Program)" ;
:sensor = "Imager, Imager, Imager, AIRS > Atmospheric Infrared Sounder, SSMI > Special Sensor Microwave/Imager" ;
:spatial_resolution = "2.5 degree" ;
:comment = "Processing computer: eagle2.umd.edu" ;
}
I was able to make this work but It broke a lot of other tests especially curvilinear.
in precip
, coordinates
points to lon lat time
and precip
is nlat
,nlon
only.
It would have been much simpler for us if the they had lat(lat)
and lon(lon)
like everybody else.
float precip(nlat, nlon) ;
precip:long_name = "NOAA Climate Data Record (CDR) of GPCP Satellite-Gauge Combined Precipitation" ;
precip:standard_name = "precipitation amount" ;
precip:units = "millimeters/day" ;
precip:coordinates = "longitude latitude time" ;
precip:valid_range = 0.f, 100.f ;
precip:cell_methods = "precip: mean" ;
precip:missing_value = -9999.f ;
@dnadeau4 the standard_name "precipitation amount"
is missing an underscore, which makes me think that the file may not be CF-compliant after all, even if i could be opened to check..
Talking to @taylor13 the file is not CF-1 compliant. But Charles and I agree that CDMS should be able to retrieve axes and create a grid from the coordinates
attribute.
Actually, the file doesn't even follow the netCDF best practices (independent of CF) that you should "Make coordinate variables for every dimension possible (except for string length dimensions)." See https://www.unidata.ucar.edu/software/netcdf/docs/BestPractices.html . If you want to access the data array in netCDF using the coordinate values (rather than by index), the dimension must be a "coordinate variable", which means it must be of the form x(x) (e.g., not longitude(nlon), but longitude(longitude). I think CDMS is built on the notion that data should be accessible by providing the dimension values (and not just the indices), then all dimensions should be defined as coordinate variables. In summary, although core dumping shouldn't be the result of CDMS trying to read this file, I think it should error exit (gracefully), and not just give a warning. The user should rewrite the file with the longitude dimension defined as longitude(longitude) and the latitude dimension defined as latitude(latitude) in accordance with best practices and so that CDMS won't get tripped up later if the user tries to extract data using coordinates.
@dnadeau4 my thoughts with this.. cdms
should be able to read ANYTHING, and so if any of the coord or dimension stuff fails.. Then you get a warning about this failure and rather than the complete "dressed" TransientVariable, you get aaccess to the numpy array (or TransientVariable) with the coords and attributes unset - so they all are none
or equivalent types..
yes, that would probably be fine, but it might require major modification of CDMS ... is the effort worth it? That's up to Charles and Dean, since they're paying for it, and Denis, since he's already oversubscribed.
@doutriaux1 pinging you here
@dnadeau4 and I talked about it at length, I trust @dnadeau4 to do what's best considering effort/value/time
I think that CDMS was built on the assumption that the longitude and latitude values would be stored as true netCDF coordinate variables (e.g., lon(lon), not lon(nlon)), so it may be difficult to anticipate how extensive the revisions will be if CDMS is to handle this type of structure. I never heard of this error being encountered before (in over a decade), so that's why I don't think it should be high priority to generalize CDMS. I agree, however, rather than core dumping, CDMS should error exit and point out why the file cannot be read.
@taylor13 I think the inverse, that if it fails to "dress" a TransientVariable with correct coords and attributes, it falls back to standard numpy so the matrix without all the dressings, or better a TransientVariable with null/none values for all coords/attributes that cannot be determined by the code
But aren't there CDMS functions that assume that the variables are "dressed"? How will they respond if an undressed variable is given to them? Won't all these functions then have to include error responses in such cases?
The issue here is that the cdms
library fails when trying to open a valid (but not CF compliant) file. Any netcdf file should be able to be opened, and any valid netcdf variable in a complete (uncorrupted) netcdf file should be able to be read. This is the issue being discussed at the moment. If the library has trouble "dressing" the numpy array because of problems with the file, a warning should be thrown rather than an inability of doing anything with the file and the data.
What you do after you have a variable in memory (and what tools you use) is up to the user..
Ok I think @durack1 is onto something here. @dnadeau4 rather than failing we could just return a basic TV. As @taylor13 mentioned most code assumning metadata will fail, but that's ok as @taylor13 mentioned the file is poorly formatted anyway and should be rewritten. That way we win on all fronts. The user can still keep working on the file (s)he doesn't need to wait for another version of cdms2
but at the same time the experience will be limited enciting the user to do the right
thing and rewrite the data.
I like PD's compromise. If the data could be read into memory with a warning it could be possible to fix things (dress it properly) within CDAT rather than having to rely on something else to first rewrite it. BTW, this is a very popular dataset... we should contact them and explain their data is not CF compliant before their habits are copied.
@doutriaux1 exactly, I am a user and I try and write CF compliant and well described files.. But this particular issue was due to @gleckler1 trying to read a file that someone else had produced, and was unable to even open the file using cdms
. The only alternative in this case is to use another I/O library altogether, which is really something that you want to minimize at all costs.
I think a warning message should be sent to the screen in such a case, to at the minimum point out to a user that further manipulation of the TV is not going to be seamless.. It would also be great if the other TV functions would also throw a warning if the missing metadata caused problems with the computation, but that is of tertiary importance to successfully opening and reading a file
If coordinates attribute is set, I can try to retrieve the lon/lat and rename them as the dimension name. This way the user will have a dressed
transient variable. This should not be too difficult.
If the coordinates attribute doe not exist, than I need to change the Rectangular grid function to create an generic rectangular grid. if lat/lon rank is 1D. That should do the trick.
I would add a lot of warnings to make sure users are aware of what happened. Then they will be able to run cfchecker
and figure out what to change.
@gleckler1 do you want to contact them or should I?
I'm o.k. with returning an undressed transient variable, but I don't think you should do anything to try to correct it. We want the person who created the file to make it CF compliant (and adhere to the netCDF best practices). The file is not compliant with CF, so we shouldn't try to "fix" it to become CF compliant. We should, of course, tell the user we encountered a problem. We should tell them as a warnings
The netCDF best practices say "Make coordinate variables for every dimension possible (except for string length dimensions)". CDMS finds "nlat" and "nlon" out of compliance with best practices.
The CF convention states: "If the longitude, latitude, vertical or time coordinate is multi-valued, varies in only one dimension, and varies independently of other spatiotemporal coordinates, it is not permitted to store it as an auxiliary coordinate variable." (i.e., it must be stored as a regular coordinate variable (defined as a one-dimensional variable with the same name as its dimension [e.g., time(time) ]). CDMS finds "longitude" and "latitude" out of compliance with this CF convention.
CDMS is unable to "decorate" the variable (or whatever wording would make sense to a novice), so an "undressed" transient variable is being returned.
CDMS should not make any attempt to create grids, identify longitude and latitude, etc. from the file. If you do this, you are enabling folks to get way too sloppy, and that defeats the purpose of a convention.
Note that if the "coordinates" attribute had been left out of the above example, CDMS should probably respond in the same way. Does anyone know what would actually happen in that case? Would it core dump or be happy since the "coordinates" attribute was missing?
@dnadeau4 @durack1 @doutriaux1 @gleckler1 -- forgot to ping you all re the above comment.
I think an undressed TV is sufficient if we can also retrieve the axes seperately. It may be work to put the clothes back on, but better if we can do that within our tools. I do think we should contact the dataset curator... this might have most weight if it comes from KET unless he prefers PG to do it. It would be good to cc some of the key players, at a minimum Alder and Huffman. From http://eagle1.umd.edu/GPCP_ICDR/GPCP_Background.html, the dataset curator is:
Dr. Jian-Jian Wang ESSIC, University of Maryland College Park College Park, MD 20742 USA Phone: +1 301-405-4887 jjwang@umd.edu
I agree completely with @gleckler1, the tool should be able to read anything - or at least provide a user with an ability to read any properly formed (so not download garbled) netcdf file, once this is opened then a user should be able to then access any of the file variables, so read a variable called myfavoritelon
rather than the well-named and currently expected variable lon
.
Responding to @taylor13, I personally think that redressing a variable is not a priority, as that would be near impossible to anticipate all cases.. Reading all files, and all variables is the key motivation here
An important consideration for CDMS development is whether we expect it to continue to be used as part of the official CF checker (developed elsewhere). If so, we will need CDMS to reflect (and not alter) what's in the file. If we infer information from file metadata in ways not mandated by the CF conventions, and then pass the structure to the checker, the checker will assume we got the information through CF rules and find the file in conformance. This would be wrong.
Perhaps Denis' error message for the case considered here could be passed along to the checker to make it clear to the checker that the file has been adulterated.
Here are my 2 cents.
This was the old CDMS output.
Traceback (most recent call last):
File "glecker.py", line 8, in <module>
f=cdms2.open("gpcp_cdr_v23rB1_y2016_m08.nc")
File "/software/anaconda2/envs/cmor3/lib/python2.7/site-packages/cdms2/dataset.py", line 212, in openDataset
file1 = CdmsFile(path,"r")
File "/software/anaconda2/envs/cmor3/lib/python2.7/site-packages/cdms2/dataset.py", line 1016, in __init__
grid = FileGenericGrid(lat, lon, gridname, parent=self, maskvar=maskvar)
File "/software/anaconda2/envs/cmor3/lib/python2.7/site-packages/cdms2/gengrid.py", line 305, in __init__
AbstractGenericGrid.__init__(self, latAxis, lonAxis, id, maskvar, tempmask, node)
File "/software/anaconda2/envs/cmor3/lib/python2.7/site-packages/cdms2/gengrid.py", line 22, in __init__
raise CDMSError, 'Latitude and longitude axes must have the same shape.'
cdms2.error.CDMSError: Latitude and longitude axes must have the same shape.
This is the new one which allows to use the file.
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning:
** Variable "latitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlat(nlat)"
** Verify that your file is CF-1 compliant!
warnings.warn(msg)
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning:
** Variable "longitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlon(nlon)"
** Verify that your file is CF-1 compliant!
warnings.warn(msg)
But running the cfchecker.py on this file seems to introduce erroneous warning about missing_values
in bounds.
cfchecks -v1.6 gpcp_cdr_v23rB1_y2016_m08.nc
CHECKING NetCDF FILE: gpcp_cdr_v23rB1_y2016_m08.nc
=====================
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning:
** Variable "latitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlat(nlat)"
** Verify that your file is CF-1 compliant!
warnings.warn(msg)
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning:
** Variable "longitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlon(nlon)"
** Verify that your file is CF-1 compliant!
warnings.warn(msg)
Using CF Checker Version 2.0.9
Checking against CF Version CF-1.6
Using Standard Name Table Version 44 (2017-05-23T11:17:23Z)
Using Area Type Table Version 6 (22 February 2017)
------------------
Checking variable: lat_bounds
------------------
ERROR: Attribute missing_value of incorrect type
WARNING (7.1): Boundary Variable lat_bounds should not have missing_value attribute
------------------
Checking variable: time_bounds
------------------
WARNING (3): No standard_name or long_name attribute specified
ERROR: Attribute missing_value of incorrect type
------------------
Checking variable: time
------------------
------------------
Checking variable: longitude
------------------
WARNING: attribute missing_value attached to wrong kind of variable
------------------
Checking variable: precip
------------------
ERROR (3.3): Invalid standard_name: precipitation
ERROR (3.3): Invalid standard_name modifier: amount
ERROR (7.3): Invalid 'name' in cell_methods attribute: precip
------------------
Checking variable: lon_bounds
------------------
ERROR: Attribute missing_value of incorrect type
WARNING (7.1): Boundary Variable lon_bounds should not have missing_value attribute
------------------
Checking variable: latitude
------------------
WARNING: attribute missing_value attached to wrong kind of variable
------------------
Checking variable: precip_error
------------------
ERRORS detected: 5
WARNINGS given: 5
INFORMATION messages: 0
I deleted forcing missing_value
for fvariable
and was able to get the right report from cfcheckers.
CHECKING NetCDF FILE: gpcp_cdr_v23rB1_y2016_m08.nc
=====================
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning:
** Variable "latitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlat(nlat)"
** Verify that your file is CF-1 compliant!
warnings.warn(msg)
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning:
** Variable "longitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlon(nlon)"
** Verify that your file is CF-1 compliant!
warnings.warn(msg)
Using CF Checker Version 2.0.9
Checking against CF Version CF-1.6
Using Standard Name Table Version 44 (2017-05-23T11:17:23Z)
Using Area Type Table Version 6 (22 February 2017)
------------------
Checking variable: lat_bounds
------------------
------------------
Checking variable: time_bounds
------------------
WARNING (3): No standard_name or long_name attribute specified
------------------
Checking variable: time
------------------
------------------
Checking variable: longitude
------------------
WARNING: attribute missing_value attached to wrong kind of variable
------------------
Checking variable: precip
------------------
ERROR (3.3): Invalid standard_name: precipitation
ERROR (3.3): Invalid standard_name modifier: amount
ERROR (7.3): Invalid 'name' in cell_methods attribute: precip
------------------
Checking variable: lon_bounds
------------------
------------------
Checking variable: latitude
------------------
WARNING: attribute missing_value attached to wrong kind of variable
------------------
Checking variable: precip_error
------------------
ERRORS detected: 2
WARNINGS given: 3
INFORMATION messages: 0
That looks much better and although I don't see a python prompt I gather that the undressed array is read into memory. Do I have that right?
While its great to verify if the data has CF compliant grid structure I don't think the standard name should be checked - when CDMS is being used for research we often save obscure diagnostics that don't have standard names.
Denis, did you mention previously that the CF checker is no longer using CDMS?
From: Denis Nadeau notifications@github.com Sent: Friday, May 26, 2017 3:17:56 PM To: UV-CDAT/cdms Cc: gleckler1; Mention Subject: Re: [UV-CDAT/cdms] Can't read CF-1.5 compliant GPCP data ... (#105)
Here are my 2 cents.
This was the old CDMS output.
Traceback (most recent call last):
File "glecker.py", line 8, in
This is the new one which allows to use the file.
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: Variable "latitude" found in the "coordinates" attribute will be used as coordinate variable CDMS could not find the coordinate variable "nlat(nlat)" Verify that your file is CF-1 compliant! warnings.warn(msg) /software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: Variable "longitude" found in the "coordinates" attribute will be used as coordinate variable CDMS could not find the coordinate variable "nlon(nlon)" Verify that your file is CF-1 compliant! warnings.warn(msg)
But running the cfchecker.py on this file seems to introduce erroneous warning about missing_values in bounds.
cfchecks -v1.6 gpcp_cdr_v23rB1_y2016_m08.nc
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: Variable "latitude" found in the "coordinates" attribute will be used as coordinate variable CDMS could not find the coordinate variable "nlat(nlat)" Verify that your file is CF-1 compliant! warnings.warn(msg) /software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning: Variable "longitude" found in the "coordinates" attribute will be used as coordinate variable CDMS could not find the coordinate variable "nlon(nlon)" Verify that your file is CF-1 compliant! warnings.warn(msg) Using CF Checker Version 2.0.9 Checking against CF Version CF-1.6 Using Standard Name Table Version 44 (2017-05-23T11:17:23Z) Using Area Type Table Version 6 (22 February 2017)
ERROR: Attribute missing_value of incorrect type WARNING (7.1): Boundary Variable lat_bounds should not have missing_value attribute
WARNING (3): No standard_name or long_name attribute specified ERROR: Attribute missing_value of incorrect type
WARNING: attribute missing_value attached to wrong kind of variable
ERROR (3.3): Invalid standard_name: precipitation ERROR (3.3): Invalid standard_name modifier: amount ERROR (7.3): Invalid 'name' in cell_methods attribute: precip
ERROR: Attribute missing_value of incorrect type WARNING (7.1): Boundary Variable lon_bounds should not have missing_value attribute
WARNING: attribute missing_value attached to wrong kind of variable
ERRORS detected: 5 WARNINGS given: 5 INFORMATION messages: 0
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/UV-CDAT/cdms/issues/105#issuecomment-304399380, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEV6rdvn1RfKigiljf5pGOnZAeSr8NCaks5r90-UgaJpZM4MqnWf.
CF checker is using CDMS
, and the array is read by cfchecks
into memory.
time_bounds
variable was not attached to attribute time:bounds
, this is why cfchecks
says that standard_name
is missing.
So we found another problem with that file, but I at least CDMS was able to open it.
@dnadeau4 what is the output of f.variable
(Or is it variables?) after the file is opened with the warnings?
>>> import cdms2
>>> f=cdms2.open("gpcp_cdr_v23rB1_y2016_m08.nc")
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning:
** Variable "latitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlat(nlat)"
** Verify that your file is CF-1 compliant!
warnings.warn(msg)
/software/anaconda2/envs/devel/lib/python2.7/site-packages/cdms2/dataset.py:1173: UserWarning:
** Variable "longitude" found in the "coordinates" attribute will be used as coordinate variable
** CDMS could not find the coordinate variable "nlon(nlon)"
** Verify that your file is CF-1 compliant!
warnings.warn(msg)
>>> f.listvariables()
['lat_bounds', 'nlat', 'time_bounds', 'longitude', 'precip', 'lon_bounds', 'latitude', 'nlon', 'precip_error']
In this case,
nlat == longitude
nlon == latitude
I have just read this long thread, and I agree that it would be nice if cdms2 could at least return a numpy array when reading problematic datasets, so that I don't have to read the netCDF4 documentation to read these. Some best effort thing and enough warning sounds good!
Now, if the problematic axes (or whatever passes for axes) don't come attached with the problematic variables, will I also be able to also read these "axes" as variables (ie get the axes' values in memory)? I have a vague memory of having trouble to read axes independently of variables
Also, this mail mentioning the CF convention has reminded me of problems I have sometimes had with packed data. Details in #140
I created some transientAxis
for such a case. The solution is simple, just took me a while to understand the architecture.
import cdms2
f=cdms2.open("gpcp_cdr_v23rB1_y2016_m08.nc")
data=f['precip']
data.getAxis(0)[:]
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.,
11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21.,
22., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32.,
33., 34., 35., 36., 37., 38., 39., 40., 41., 42., 43.,
44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54.,
55., 56., 57., 58., 59., 60., 61., 62., 63., 64., 65.,
66., 67., 68., 69., 70., 71.], dtype=float32)
data.getAxis(1)[:]
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8.,
9., 10., 11., 12., 13., 14., 15., 16., 17.,
18., 19., 20., 21., 22., 23., 24., 25., 26.,
27., 28., 29., 30., 31., 32., 33., 34., 35.,
36., 37., 38., 39., 40., 41., 42., 43., 44.,
45., 46., 47., 48., 49., 50., 51., 52., 53.,
54., 55., 56., 57., 58., 59., 60., 61., 62.,
63., 64., 65., 66., 67., 68., 69., 70., 71.,
72., 73., 74., 75., 76., 77., 78., 79., 80.,
81., 82., 83., 84., 85., 86., 87., 88., 89.,
90., 91., 92., 93., 94., 95., 96., 97., 98.,
99., 100., 101., 102., 103., 104., 105., 106., 107.,
108., 109., 110., 111., 112., 113., 114., 115., 116.,
117., 118., 119., 120., 121., 122., 123., 124., 125.,
126., 127., 128., 129., 130., 131., 132., 133., 134.,
135., 136., 137., 138., 139., 140., 141., 142., 143.], dtype=float32)
BRAVO!
From: Denis Nadeau notifications@github.com Reply-To: UV-CDAT/cdms reply@reply.github.com Date: Thursday, August 3, 2017 at 6:24 PM To: UV-CDAT/cdms cdms@noreply.github.com Cc: gleckler1 pgleckler@llnl.gov, Mention mention@noreply.github.com Subject: Re: [UV-CDAT/cdms] Can't read CF-1.5 compliant GPCP data ... (#105)
I created some transientAxis for such a case. The solution is simple, just took me a while to understand the architecture.
import cdms2
f=cdms2.open("gpcp_cdr_v23rB1_y2016_m08.nc")
data=f['precip']
data.getAxis(0)[:]
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.,
11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21.,
22., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32.,
33., 34., 35., 36., 37., 38., 39., 40., 41., 42., 43.,
44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54.,
55., 56., 57., 58., 59., 60., 61., 62., 63., 64., 65.,
66., 67., 68., 69., 70., 71.], dtype=float32)
data.getAxis(1)[:]
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8.,
9., 10., 11., 12., 13., 14., 15., 16., 17.,
18., 19., 20., 21., 22., 23., 24., 25., 26.,
27., 28., 29., 30., 31., 32., 33., 34., 35.,
36., 37., 38., 39., 40., 41., 42., 43., 44.,
45., 46., 47., 48., 49., 50., 51., 52., 53.,
54., 55., 56., 57., 58., 59., 60., 61., 62.,
63., 64., 65., 66., 67., 68., 69., 70., 71.,
72., 73., 74., 75., 76., 77., 78., 79., 80.,
81., 82., 83., 84., 85., 86., 87., 88., 89.,
90., 91., 92., 93., 94., 95., 96., 97., 98.,
99., 100., 101., 102., 103., 104., 105., 106., 107.,
108., 109., 110., 111., 112., 113., 114., 115., 116.,
117., 118., 119., 120., 121., 122., 123., 124., 125.,
126., 127., 128., 129., 130., 131., 132., 133., 134.,
135., 136., 137., 138., 139., 140., 141., 142., 143.], dtype=float32)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/UV-CDAT/cdms/issues/105#issuecomment-320124066, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEV6raDL4St-nh2nt0uD39JR4-BHw0fKks5sUmSNgaJpZM4MqnWf.
I have just reviewed this issue and while you can now read the data (and even plot it -- very nice), I don't think the error messages are sufficiently explicit. Here are my comments/questions:
1) CDMS should not automatically attach a "missing_value" attribute to any variable, and as you found out, this will produce an error in the case of coordinate variables.
2) The CF checker should (and does) throw a warning message when both the standard name and the long name are missing (because although neither is required by CF at least one of them is "strongly recommended". For "obscure" variables that haven't yet been assigned a standard_name, you should define the long_name; then you won't get a warning from the CF checker. In the case of precip, of course, the standard_name exists and in the file was just incorrect, as the CF checker pointed out.
3) As some of you may have noticed, besides the problem with the coordinate attributes not being CF compliant, there are other problems, including
4) I don't think the error message you provide is strong enough. It says to "** Verify that your file is CF-1 compliant!". Then you run the Checker, and it doesn't say anything about the coordinates being non-compliant, so as a user, I would assume that this is not a problem. We want the user to know when their files are non-compliant by running the checker, but the checker doesn't raise an error because you fixed the problem after you read the data in. Your error message should say something more explicit like:
** Your file does not conform to CF standards or even the netCDF best practices.
CDMS has corrected certain problems, but you should rewrite your file to make
it CF-compliant by correcting the problem with your coordinates. Note that
since your file's data structures have been altered by CDMS, the CF checker
cannot be trusted to expose all the errors in your file.
5) The result of your fix is: >>> f.listvariables() ['lat_bounds', 'nlat', 'time_bounds', 'longitude', 'precip', 'lon_bounds', 'latitude', 'nlon', 'precip_error']. I presume scalars like "nv" are omitted from this list. In the original, nlon and nlat are scalars, but you've made them into coordinate variables, so you've clearly altered the structure of the file. That's why the CF checker doesn't throw an error about the coordinates.
@taylor13
missing_values
automatically and I am slowly getting ready for the wrath of Ken
.long_name
, I should not mess up with what is in the file. This is the reason why we remove the use of the coordinates
attribute to regenerate the grid. You said it could make the user sloppy
.
f.listvariables()
['lat_bounds', 'time_bounds', 'longitude', 'precip', 'lon_bounds', 'latitude', 'precip_error']
f.listdimension()
['nlat', 'nlon', 'nv', 'time']
I don't think you should be correcting their files either. I was just pointing out all the things wrong with the original file.
concerning 4 ... What I was trying to say is that you should provide a very strong warning. Getting rid of the warning is not the answer. We want to encourage them to correct the file, not have CDMS correct it. I'm fine with CDMS being able to read it now, but CDMS should point out that they were very sloppy in creating their file, and that you had to generate coordinate axes for them in order to read and plot the data.
If no one is using the old CF checker, then this is less important, but if some folks still use it to check their files, then CDMS must tell them when their files our out of compliance. You have hidden the problems with the files from the checker.
If we don't want CDMS used as a checker, we should make sure no one can download the CF-checker version that is based on CDMS.
Just to reiterate ... As I understand it, PrePARE uses CDMS to read files destined for the CMIP6 archive and in one of it's "checks", PrePARE runs the "old" version of the CF-Checker. If CDMS alters the data structures or "decorates" variables it finds in the files before performing its checks, then it won't be really evaluating the original file for conformance, but what CDMS generates from that file. We must therefore be sure that if CDMS makes any changes to the data (like correcting the coordinate information), then the user must be made informed (with an explicit warning) that the CF checker could be fooled into thinking the file is CF-compliant. That is why I suggested in item 4 of https://github.com/UV-CDAT/cdms/issues/105#issuecomment-320301200 that an explicit warning be raised.
Just an FYI, the CF-checker that many folks use is accessible online at http://puma.nerc.ac.uk/cgi-bin/cf-checker.pl - that version is using an older version of CDMS2. They have rewritten the checker to use netcdf4-python and that version (linked off the page above) is found at http://pumatest.nerc.ac.uk/cgi-bin/cf-checker-dev.pl
Data can be retrieved via wget from the following site: http://eagle1.umd.edu/GPCP_ICDR/GPCP_Monthly.html
The issue appears to be the dimension of the lat and lon bounds, e.g., lat_bounds(nlat, 2)
import cdms2