SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
625 stars 283 forks source link

Can't read "volcello" files #3367

Closed znicholls closed 4 years ago

znicholls commented 5 years ago

As far as I can tell, iris can't read any "volcello" files e.g.

>>> import iris

# e.g. file (I've also tried other CMIP6 volcello files and get the same behaviour)
>>> volcello_path = "CMIP6/CMIP/NCAR/CESM2-WACCM/piControl/r1i1p1f1/Ofx/volcello/gn/v20190320/volcello_Ofx_CESM2-WACCM_piControl_r1i1p1f1_gn.nc"
>>> iris.load_cube(volcello_path)
.../miniconda3/envs/netcdf-scm/lib/python3.7/site-packages/iris/fileformats/cf.py:798: UserWarning: Missing CF-netCDF measure variable 'areacello', referenced by netCDF variable 'volcello'
  warnings.warn(message % (variable_name, nc_var_name))
Traceback (most recent call last):
  File ".../miniconda3/envs/netcdf-scm/lib/python3.7/site-packages/iris/__init__.py", line 377, in load_cube
    cube = cubes.merge_cube()
  File ".../miniconda3/envs/netcdf-scm/lib/python3.7/site-packages/iris/cube.py", line 374, in merge_cube
    raise ValueError("can't merge an empty CubeList")
ValueError: can't merge an empty CubeList

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../miniconda3/envs/netcdf-scm/lib/python3.7/site-packages/iris/__init__.py", line 381, in load_cube
    raise iris.exceptions.ConstraintMismatchError('no cubes found')
iris.exceptions.ConstraintMismatchError: no cubes found

I did some digging and I think the reason is that "volcello" is being identified as a iris.fileformats.cf.CFMeasureVariable and hence does not appear in iris.fileformats.cf.CFGroup.data_variables (which matters here https://github.com/SciTools/iris/blob/5d4976614d39a93c12e640ff5a2ec1ccdd6e8380/lib/iris/fileformats/netcdf.py#L709)

If someone can suggest a fix, I'm happy to try and implement it (alongside a test). I just have no idea what the logic sitting underneath all the CF stuff is so can't work out how to start.

pp-mo commented 5 years ago

Don't know much about the background of this, but I downloaded a "volcello_fx_GFDL-CM3_historicalMisc_r0i0p0.nc", and I can load that fine. It's dump is like this (in part, it's long. Shame we don't have comment section collapse) ...

$ ncdump -h volcello_fx_GFDL-CM3_historicalMisc_r0i0p0.nc 
netcdf volcello_fx_GFDL-CM3_historicalMisc_r0i0p0 {
dimensions:
    lev = 50 ;
    rlat = 200 ;
    rlon = 360 ;
    vertices = 4 ;
    bnds = 2 ;
variables:
            . . .
    float volcello(lev, rlat, rlon) ;
        volcello:missing_value = 1.e+20f ;
        volcello:coordinates = "lat lon" ;
        volcello:_FillValue = 1.e+20f ;
        volcello:standard_name = "ocean_volume" ;
        volcello:long_name = "Ocean Grid-Cell Volume" ;
        volcello:units = "m-3" ;
        volcello:comments = "Note: This model output is presented on the model\'s tripolar grid. The North Pole singularity is avoided by using this nonregular grid north of 65N. More information about the ocean and sea ice model\'s grid can be found at http://nomads.gfdl.noaa.gov/." ;
        volcello:original_name = "volcello" ;
        volcello:original_units = "<undefined>" ;
        volcello:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_ocean_fx_GFDL-CM3_historicalMisc_r0i0p0.nc" ;
    . . .
}

Is it really a volcello file that won't load, or a file that refers to one ?? If so it could be a problem with the concept of 'associated files' For instance, I see files in the archives with structures like ...

    float mydata(time, plev, lat, lon) ;
            . . .
        mydata:cell_measures = "volume: volcello" ;
        mydata:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/xxx volcello: volcello_xxxx.nc" ;

The meaning is that there is a cross-reference between files.
I'm pretty sure Iris doesn't know how to do that, though I'm surprised if this stops you loading something. I assume your case must be a bit different.

An ncdump would really help here Can you give a bit more detail @znicholls ?

znicholls commented 5 years ago

Hi @pp-mo, here's an ncdump (it's the full thing as I don't know what I should be looking at, sorry it's so long). If any more information is helpful let me know.

ncdump -h volcello_Ofx_CESM2-WACCM_piControl_r1i1p1f1_gn.nc 
netcdf volcello_Ofx_CESM2-WACCM_piControl_r1i1p1f1_gn {
dimensions:
    nlat = 384 ;
    d2 = 2 ;
    nlon = 320 ;
    lev = 60 ;
    vertices = 4 ;
variables:
    float volcello(lev, nlat, nlon) ;
        volcello:_FillValue = 1.e+20f ;
        volcello:cell_measures = "area: areacello volume: volcello" ;
        volcello:cell_methods = "area: mean" ;
        volcello:comment = "For oceans with more than 1 mesh (e.g. staggered grids), report areas that apply to surface vertical fluxes of energy. If this field is time-dependent then save it instead as one of your Omon and Odec fields" ;
        volcello:coordinates = "lev lat lon" ;
        volcello:description = "For oceans with more than 1 mesh (e.g. staggered grids), report areas that apply to surface vertical fluxes of energy. If this field is time-dependent then save it instead as one of your Omon and Odec fields" ;
        volcello:frequency = "fx" ;
        volcello:id = "volcello" ;
        volcello:long_name = "Ocean Grid-Cell Volume" ;
        volcello:mipTable = "Ofx" ;
        volcello:missing_value = 1.e+20 ;
        volcello:out_name = "volcello" ;
        volcello:prov = "fx ((isd.003))" ;
        volcello:realm = "ocean" ;
        volcello:standard_name = "ocean_volume" ;
        volcello:time_label = "None" ;
        volcello:time_title = "No temporal dimensions ... fixed field" ;
        volcello:title = "Ocean Grid-Cell Volume" ;
        volcello:type = "real" ;
        volcello:units = "m3" ;
        volcello:variable_id = "volcello" ;
    double lat(nlat, nlon) ;
        lat:axis = "Y" ;
        lat:bounds = "lat_bnds" ;
        lat:standard_name = "latitude" ;
        lat:title = "Latitude" ;
        lat:type = "double" ;
        lat:units = "degrees_north" ;
        lat:valid_max = 90. ;
        lat:valid_min = -90. ;
    double lev(lev) ;
        lev:axis = "Z" ;
        lev:bounds = "lev_bnds" ;
        lev:positive = "down" ;
        lev:standard_name = "olevel" ;
        lev:title = "ocean model level" ;
        lev:type = "double" ;
        lev:units = "centimeters" ;
    double lon(nlat, nlon) ;
        lon:axis = "X" ;
        lon:bounds = "lon_bnds" ;
        lon:standard_name = "longitude" ;
        lon:title = "Longitude" ;
        lon:type = "double" ;
        lon:units = "degrees_east" ;
        lon:valid_max = 360. ;
        lon:valid_min = 0. ;
    int nlat(nlat) ;
        nlat:long_name = "cell index along second dimension" ;
        nlat:units = "1" ;
    int nlon(nlon) ;
        nlon:long_name = "cell index along first dimension" ;
        nlon:units = "1" ;
    float lat_bnds(nlat, nlon, vertices) ;
        lat_bnds:units = "degrees_north" ;
    float lon_bnds(nlat, nlon, vertices) ;
        lon_bnds:units = "degrees_east" ;
    float lev_bnds(lev, d2) ;
        lev_bnds:units = "m" ;

// global attributes:
        :Conventions = "CF-1.7 CMIP-6.2" ;
        :activity_id = "CMIP" ;
        :case_id = "1" ;
        :cesm_casename = "b.e21.BW1850.f09_g17.CMIP6-piControl.001" ;
        :contact = "cesm_cmip6@ucar.edu" ;
        :creation_date = "2019-01-29T19:10:22Z" ;
        :data_specs_version = "01.00.29" ;
        :experiment = "pre-industrial control" ;
        :experiment_id = "piControl" ;
        :external_variables = "areacello volcello" ;
        :forcing_index = 1LL ;
        :frequency = "fx" ;
        :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2-WACCM.piControl.none.r1i1p1f1" ;
        :grid = "native gx1v7 displaced pole grid (384x320 latxlon)" ;
        :grid_label = "gn" ;
        :initialization_index = 1LL ;
        :institution = "National Center for Atmospheric Research, Climate and Global Dynamics Laboratory, 1850 Table Mesa Drive, Boulder, CO 80305, USA" ;
        :institution_id = "NCAR" ;
        :license = "CMIP6 model data produced by <The National Center for Atmospheric Research> is licensed under a Creative Commons Attribution-[]ShareAlike 4.0 International License (https://creativecommons.org/licenses/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file)[]. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
        :mip_era = "CMIP6" ;
        :model_doi_url = "https://doi.org/10.5065/D67H1H0V" ;
        :nominal_resolution = "100 km" ;
        :parent_mip_era = "CMIP6" ;
        :parent_source_id = "CESM2-WACCM" ;
        :parent_time_units = "days since 0001-01-01 00:00:00" ;
        :physics_index = 1LL ;
        :product = "model-output" ;
        :realization_index = 1LL ;
        :realm = "ocean" ;
        :source = "CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite volume grid; 288 x 192 longitude/latitude; 70 levels; top level 4.5e-6 mb); ocean: POP2 (320x384 longitude/latitude; 60 levels; top grid cell 0-10 m); sea_ice: CICE5.1 (same grid as ocean); land: CLM5 0.9x1.25 finite volume grid; 288 x 192 longitude/latitude; 70 levels; top level 4.5e-6 mb); aerosol: MAM4 (0.9x1.25 finite volume grid; 288 x 192 longitude/latitude; 70 levels; top level 4.5e-6 mb); atmosChem: WACCM (0.9x1.25 finite volume grid; 288 x 192 longitude/latitude; 70 levels; top level 4.5e-6 mb; landIce: CISM2.1; ocnBgchem: MARBL (320x384 longitude/latitude; 60 levels; top grid cell 0-10 m)" ;
        :source_id = "CESM2-WACCM" ;
        :source_type = "AOGCM BGC CHEM AER" ;
        :sub_experiment = "none" ;
        :sub_experiment_id = "none" ;
        :table_id = "Ofx" ;
        :tracking_id = "hdl:21.14100/3ad6d7f4-6ccd-4c61-ae62-018d0fdc079e" ;
        :variable_id = "volcello" ;
        :variant_info = "CMIP6 CESM2 piControl experiment with high-top atmosphere (WACCM6) with interactive chemistry (TSMLT1), interactive land (CLM5), coupled ocean (POP2) with biogeochemistry (MARBL), interactive sea ice (CICE5.1), and non-evolving land ice (CISM2.1)" ;
        :variant_label = "r1i1p1f1" ;
        :parent_experiment_id = "piControl-spinup" ;
        :parent_activity_id = "CMIP" ;
        :parent_variant_label = "r1i1p1f1" ;
        :branch_time_in_parent = 48545. ;
        :branch_time_in_child = 0. ;
        :branch_method = "standard" ;
}
pp-mo commented 5 years ago

@znicholls Some sample metadata to look at

Thanks ! It looks like the problem is that the main variable is not only itself called 'volcello', but has the attribute volcello:cell_measures = "area: areacello volume: volcello" ; So, it appears to reference itself as a cell measure variable.

When I rename it, the file load ok in Iris.

Are you confident this is actually correct, and/or is this a standard gridfile output used by some model ??

ehogan commented 5 years ago

There's a discussion about this here: https://github.com/cmip6dr/CMIP6_DataRequest_VariableDefinitions/issues/394

znicholls commented 5 years ago

@ehogan is it also your understanding that the changes proposed in https://github.com/cmip6dr/CMIP6_DataRequest_VariableDefinitions/issues/394 would not fix the problem identified here?

ehogan commented 5 years ago

@znicholls yes, it looks like the volume: volcello will continue to be part of the cell_measures for volcello (for consistency with other variables with this cell volume). It would be great if Iris had the ability to read these files :)

pp-mo commented 5 years ago

a discussion about this here

Nice spot @ehogan !

Ok, I also chased up the CESM2 downloads from NCAR, from "b.e21.B1850.f09_g17.CMIP6-deforest-globe.001" --> file volcello_Ofx_CESM2_deforest-globe_r1i1p1f1_gn.nc And it does have this same problem.

It does look like it would be a good idea for Iris to tolerate this quirk, after all. So I wouldn't close this just yet ...

znicholls commented 5 years ago

ok great. @pp-mo I'm happy to do a PR if you can point me to the bit of the code I should look at. My initial hack solution would be to just drop this cell measure if it's a volcello file but maybe you can see a better way.

pp-mo commented 5 years ago

I'm happy to do a PR if you can point me to the bit of the code I should look at.

Thanks @znicholls !

I've been looking at the code, it's a bit nasty with a lot of boilerplate, due to some difficulties generalising the classification operation in each subclass of iris.fileformats.cf.CfVariable. For this very limited purpose, though, I think you can probably just change around this line, so it applies to the CfMeasureVariable class alone. Roughly, change if variable_name not in ignore: to if variable_name not in ignore and variable_name != nc_var_name:

The idea is that a cell_measure reference to a candidate "doesn't count" if it is found on the variable itself.

Well, that's just my hint. No promise that this works !

znicholls commented 5 years ago

Cheers @pp-mo ! Is it possible to get a volcello file added to the iris test data or should I write one using netCDF-4 as part of a test setup instead?

pp-mo commented 5 years ago

It's up to you really, depending how complicated on-the-fly creation looks. A "real" file might have more credibility? To add to iris-test-data ...

znicholls commented 5 years ago

I've started working on a fix (using on-the-fly creation first), see #3386 .

  • try to keep it small-ish

The smallest volcello file I can find in the CMIP6 database is 8Mb. Is that too big?

pp-mo commented 5 years ago

8Mb. Is that too big?

Sounds ok to me. iris-test-data content is currently about 340Mb total.