CDAT / cdms

8 stars 10 forks source link

cdscan issues with poorly formed netcdf files #430

Open durack1 opened 3 years ago

durack1 commented 3 years ago

This redirects the issue described in https://github.com/pochedls/xagg/issues/33

cdscan is having problems with poorly formed netcdf files. These files contain valid data but have been poorly defined, for e.g. a time fixed field (no time dimension) that includes a time dimension that has no values. For the example below, cdms2 can read the areacello variable from the file, but cdscan throws an error. For comparison, a valid file ncdump is included at the bottom of this issue.

$ ncdump -ct ~/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/piControl/r1i1p1f2/Ofx/areacello/gn/v20180814/areacello_Ofx_CNRM-CM6-1_piControl_r1i1p1f2_gn.nc
netcdf areacello_Ofx_CNRM-CM6-1_piControl_r1i1p1f2_gn {
dimensions:
    axis_nbounds = 2 ;
    x = 362 ;
    y = 294 ;
    nvertex = 4 ;
    time = UNLIMITED ; // (0 currently)
variables:
    double lat(y, x) ;
        lat:standard_name = "latitude" ;
        lat:long_name = "Latitude" ;
...
    double lon(y, x) ;
        lon:standard_name = "longitude" ;
        lon:long_name = "Longitude" ;
...
    double bounds_lon(y, x, nvertex) ;
    double bounds_lat(y, x, nvertex) ;
    float areacello(y, x) ;
        areacello:standard_name = "cell_area" ;
        areacello:long_name = "Grid-Cell Area" ;
        areacello:units = "m2" ;
...
        areacello:history = "none" ;

// global attributes:
...

To Reproduce Steps to reproduce the behavior:

  1. Install CDAT 8.2.1 nompi
  2. Attempt to run cdscan on the file listed above
    (cdat821nompi) bash-4.2$ cdscan -x tmp.xml ~/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/piControl/r1i1p1f2/Ofx/areacello/gn/v20180814/areacello_Ofx_CNRM-CM6-1_piControl_r1i1p1f2_gn.nc
    Finding common directory ...
    Common directory: ~/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/piControl/r1i1p1f2/Ofx/areacello/gn/v20180814/
    Scanning files ...
    ~/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/piControl/r1i1p1f2/Ofx/areacello/gn/v20180814/areacello_Ofx_CNRM-CM6-1_piControl_r1i1p1f2_gn.nc
    Setting reference time units to 
    Traceback (most recent call last):
    File "~/anaconda3/envs/cdat821nompi/bin/cdscan", line 1842, in <module>
    main(sys.argv)
    File "~/anaconda3/envs/cdat821nompi/bin/cdscan", line 1284, in main
    timeIsLinear = (referenceTime[0].lower().split() in
    IndexError: string index out of range
  3. See cdscan error above

And here is an ncdump of a validly formed file (note no time dimension is defined)

(cdat821nompi) bash-4.2$ ncdump -ct ~/esgf_publish/CMIP6/CMIP/CSIRO-ARCCSS/ACCESS-CM2/1pctCO2/r1i1p1f1/Ofx/areacello/gn/v20191109/areacello_Ofx_ACCESS-CM2_1pctCO2_r1i1p1f1_gn.nc 
netcdf areacello_Ofx_ACCESS-CM2_1pctCO2_r1i1p1f1_gn {
dimensions:
    j = 300 ;
    i = 360 ;
    bnds = 2 ;
    vertices = 4 ;
variables:
    int j(j) ;
        j:units = "1" ;
        j:long_name = "cell index along second dimension" ;
    int i(i) ;
        i:units = "1" ;
        i:long_name = "cell index along first dimension" ;
    double latitude(j, i) ;
        latitude:standard_name = "latitude" ;
        latitude:long_name = "latitude" ;
...
        latitude:bounds = "vertices_latitude" ;
    double longitude(j, i) ;
        longitude:standard_name = "longitude" ;
        longitude:long_name = "longitude" ;
...
        longitude:bounds = "vertices_longitude" ;
    double vertices_latitude(j, i, vertices) ;
        vertices_latitude:units = "degrees_north" ;
...
    double vertices_longitude(j, i, vertices) ;
        vertices_longitude:units = "degrees_east" ;
...
    float areacello(j, i) ;
        areacello:standard_name = "cell_area" ;
        areacello:long_name = "Grid-Cell Area for Ocean Variables" ;
        areacello:comment = "Horizontal area of ocean grid cells" ;
        areacello:units = "m2" ;
...

// global attributes: