SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
637 stars 284 forks source link

Support external_variables for cell_measures #3329

Closed bjlittle closed 8 months ago

bjlittle commented 5 years ago

NetCDF loading requires to support the global external_variables attribute CF 2.6.3 External Variables so that cell_measures which reference a NetCDF variable that is not in the same file as the associated cube NetCDF data variable, may be loaded, see 7.2 Cell Measures.

To be honest, in my opinion the use of external_variables is a pretty weak contract as a means to identify a target file that may contain the required cell measure variable - some sort of pragmatic implementation is required here. Perhaps the user also specifying the file to cube.load et al, is also enough for iris to join the dots to locate the cell measure variable.

bjlittle commented 5 years ago

@zklaus Do you have an ncdump -h example from ESMValTool where the cube data variable is in one file and the cell measures is in another file, and iris loads them both as unrelated cubes?

Or indeed, where both the cube data variable and cell measures variable are in the same file and iris loads them both as unrelated cubes?

zklaus commented 5 years ago

Yes. I'll focus on CMIP here. There, only one data variable per file is permitted and the cell_measures are treated as variables of their own right. Consequently can live in their own tables and one typical example is the sea surface temperature (sst), which is called tos (temperature ocean surface) in CMIP. One important cell_measure here is the area of the ocean cells, areacello in CMIP so that we have, for example in CMIP6:

from tos_Omon_IPSL-CM6A-LR_piControl_r1i1p1f1_gn_185001-234912.nc:

netcdf tos_Omon_IPSL-CM6A-LR_piControl_r1i1p1f1_gn_185001-234912 {
dimensions:
        axis_nbounds = 2 ;
        x = 362 ;
        y = 332 ;
        nvertex = 4 ;
        time = UNLIMITED ; // (6000 currently)
variables:
        float nav_lat(y, x) ;
                nav_lat:standard_name = "latitude" ;
                nav_lat:long_name = "Latitude" ;
                nav_lat:units = "degrees_north" ;
                nav_lat:bounds = "bounds_nav_lat" ;
        float nav_lon(y, x) ;
                nav_lon:standard_name = "longitude" ;
                nav_lon:long_name = "Longitude" ;
                nav_lon:units = "degrees_east" ;
                nav_lon:bounds = "bounds_nav_lon" ;
        float bounds_nav_lon(y, x, nvertex) ;
        float bounds_nav_lat(y, x, nvertex) ;
        float area(y, x) ;
                area:standard_name = "cell_area" ;
                area:units = "m2" ;
                area:coordinates = "nav_lon nav_lat" ;
        double time(time) ;
                time:axis = "T" ;
                time:standard_name = "time" ;
                time:long_name = "Time axis" ;
                time:calendar = "gregorian" ;
                time:units = "days since 1850-01-01 00:00:00" ;
                time:time_origin = "1850-01-01 00:00:00" ;
                time:bounds = "time_bounds" ;
        double time_bounds(time, axis_nbounds) ;
        float tos(time, y, x) ;
                tos:standard_name = "sea_surface_temperature" ;
                tos:long_name = "Sea Surface Temperature" ;
                tos:units = "degC" ;
                tos:online_operation = "average" ;
                tos:cell_methods = "area: mean where sea time: mean" ;
                tos:interval_operation = "2700 s" ;
                tos:interval_write = "1 month" ;
                tos:cell_measures = "area: areacello" ;
                tos:_FillValue = 1.e+20f ;
                tos:missing_value = 1.e+20f ;
                tos:coordinates = "nav_lat nav_lon" ;
                tos:description = "This may differ from \"surface temperature\" in regions of sea ice or floating ice shelves. For models using conservative temperature as the prognostic field, they should report the top ocean layer as surface potential temperature, which is the same as surface in situ temperature." ;
                tos:history = "none" ;

// global attributes:
                :name = "/ccc/work/cont003/dsm/p86maf/IGCM_OUT/IPSLCM6/PROD/piControl/CM61-LR-pi-03/CMIP6/OCE/tos_Omon_IPSL-CM6A-LR_piControl_r1i1p1f1_gn_%start_date%-%end_date%" ;
                :Conventions = "CF-1.7 CMIP-6.2" ;
                :creation_date = "2018-04-27T14:59:53Z" ;
                :tracking_id = "hdl:21.14100/640fb3e1-ddac-4da8-ac5a-339f83df1540" ;
                :description = "DECK: control" ;
                :title = "IPSL-CM6A-LR model output prepared for CMIP6 / CMIP piControl" ;
                :activity_id = "CMIP" ;
                :contact = "ipsl-cmip6@listes.ipsl.fr" ;
                :data_specs_version = "01.00.21" ;
                :dr2xml_version = "1.3" ;
                :experiment_id = "piControl" ;
                :experiment = "pre-industrial control" ;
                :external_variables = "areacello" ;
                :forcing_index = 1 ;
                :frequency = "mon" ;
                :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.IPSL.IPSL-CM6A-LR.piControl.none.r1i1p1f1" ;
                :grid = "native ocean tri-polar grid with 105 k ocean cells" ;
                :grid_label = "gn" ;
                :nominal_resolution = "100 km" ;
                :initialization_index = 1 ;
                :institution_id = "IPSL" ;
                :institution = "Institut Pierre Simon Laplace, Paris 75252, France" ;
                :license = "CMIP6 model data produced by IPSL is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (https://creativecommons.org/licenses). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https://cmc.ipsl.fr/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
                :mip_era = "CMIP6" ;
                :parent_experiment_id = "piControl-spinup" ;
                :parent_mip_era = "CMIP6" ;
                :parent_activity_id = "CMIP" ;
                :parent_source_id = "IPSL-CM6A-LR" ;
                :parent_time_units = "days since 1750-01-01 00:00:00" ;
                :parent_variant_label = "r1i1p1f1" ;
                :branch_method = "standard" ;
                :branch_time_in_parent = 36524. ;
                :branch_time_in_child = 0. ;
                :physics_index = 1 ;
                :product = "model-output" ;
                :realization_index = 1 ;
                :realm = "ocean" ;
                :source = "IPSL-CM6A-LR (2017):  atmos: LMDZ (NPv6, N96; 144 x 143 longitude/latitude; 79 levels; top level 40000 m) land: ORCHIDEE (v2.0, Water/Carbon/Energy mode) ocean: NEMO-OPA (eORCA1.3, tripolar primarily 1deg; 362 x 332 longitude/latitude; 75 levels; top grid cell 0-2 m) ocnBgchem: NEMO-PISCES seaIce: NEMO-LIM3" ;
                :source_id = "IPSL-CM6A-LR" ;
                :source_type = "AOGCM" ;
                :sub_experiment_id = "none" ;
                :sub_experiment = "none" ;
                :table_id = "Omon" ;
                :variable_id = "tos" ;
                :variant_info = ". Information provided by this attribute may in some cases be flawed. Users can find more comprehensive and up-to-date documentation via the further_info_url global attribute." ;
                :variant_label = "r1i1p1f1" ;
                :EXPID = "piControl" ;
                :CMIP6_CV_version = "cv=6.2.3.5-2-g63b123e" ;
                :dr2xml_md5sum = "00e1a4f623b35a33620b9828c66bd1c8" ;
                :model_version = "6.1.2" ;
                :history = "Tue Jul 10 16:39:30 2018: ncatted -O -a coordinates,area,o,c,nav_lon nav_lat /ccc/work/cont003/dsm/p86maf/IGCM_OUT/IPSLCM6/PROD/piControl/CM61-LR-pi-03/CMIP6/OCE/tos_Omon_IPSL-CM6A-LR_piControl_r1i1p1f1_gn_185001-234912.nc\nnone" ;
}

and for the cell_measure from areacello_Ofx_IPSL-CM6A-LR_piControl_r1i1p1f1_gn.nc:

netcdf areacello_Ofx_IPSL-CM6A-LR_piControl_r1i1p1f1_gn {
dimensions:
        axis_nbounds = 2 ;
        x = 362 ;
        y = 332 ;
        nvertex = 4 ;
        time = UNLIMITED ; // (0 currently)
variables:
        float nav_lat(y, x) ;
                nav_lat:standard_name = "latitude" ;
                nav_lat:long_name = "Latitude" ;
                nav_lat:units = "degrees_north" ;
                nav_lat:bounds = "bounds_nav_lat" ;
        float nav_lon(y, x) ;
                nav_lon:standard_name = "longitude" ;
                nav_lon:long_name = "Longitude" ;
                nav_lon:units = "degrees_east" ;
                nav_lon:bounds = "bounds_nav_lon" ;
        float bounds_nav_lon(y, x, nvertex) ;
        float bounds_nav_lat(y, x, nvertex) ;
        float area(y, x) ;
                area:standard_name = "cell_area" ;
                area:units = "m2" ;
        float areacello(y, x) ;
                areacello:standard_name = "cell_area" ;
                areacello:long_name = "Grid-Cell Area" ;
                areacello:units = "m2" ;
                areacello:online_operation = "once" ;
                areacello:cell_methods = "area: sum" ;
                areacello:cell_measures = "area: area" ;
                areacello:_FillValue = 1.e+20f ;
                areacello:missing_value = 1.e+20f ;
                areacello:coordinates = "nav_lat nav_lon" ;
                areacello:description = "Cell areas for any grid used to report ocean variables and variables which are requested as used on the model ocean grid (e.g. hfsso, which is a downward heat flux from the atmosphere interpolated onto the ocean grid). These cell areas should be defined to enable exact calculation of global integrals (e.g., of vertical fluxes of energy at the surface and top of the atmosphere)." ;
                areacello:history = "none" ;

// global attributes:
                :name = "/ccc/work/cont003/gencmip6/p86maf/IGCM_OUT/IPSLCM6/PROD/piControl/CM61-LR-pi-03f/CMIP6/OCE/areacello_Ofx_IPSL-CM6A-LR_piControl_r1i1p1f1_gn" ;
                :Conventions = "CF-1.7 CMIP-6.2" ;
                :creation_date = "2018-09-11T13:50:32Z" ;
                :tracking_id = "hdl:21.14100/98693731-a306-48d6-909d-b3598a703566" ;
                :description = "DECK: control" ;
                :title = "IPSL-CM6A-LR model output prepared for CMIP6 / CMIP piControl" ;
                :activity_id = "CMIP" ;
                :contact = "ipsl-cmip6@listes.ipsl.fr" ;
                :data_specs_version = "01.00.21" ;
                :dr2xml_version = "1.13" ;
                :experiment_id = "piControl" ;
                :experiment = "pre-industrial control" ;
                :forcing_index = 1 ;
                :frequency = "fx" ;
                :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.IPSL.IPSL-CM6A-LR.piControl.none.r1i1p1f1" ;
                :grid = "native ocean tri-polar grid with 105 k ocean cells" ;
                :grid_label = "gn" ;
                :nominal_resolution = "100 km" ;
                :history = "none" ;
                :initialization_index = 1 ;
                :institution_id = "IPSL" ;
                :institution = "Institut Pierre Simon Laplace, Paris 75252, France" ;
                :license = "CMIP6 model data produced by IPSL is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (https://creativecommons.org/licenses). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https://cmc.ipsl.fr/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
                :mip_era = "CMIP6" ;
                :parent_experiment_id = "piControl-spinup" ;
                :parent_mip_era = "CMIP6" ;
                :parent_activity_id = "CMIP" ;
                :parent_source_id = "IPSL-CM6A-LR" ;
                :parent_time_units = "days since 1750-01-01 00:00:00" ;
                :parent_variant_label = "r1i1p1f1" ;
                :branch_method = "standard" ;
                :branch_time_in_parent = 36524. ;
                :branch_time_in_child = 0. ;
                :physics_index = 1 ;
                :product = "model-output" ;
                :realization_index = 1 ;
                :realm = "ocean" ;
                :source = "IPSL-CM6A-LR (2017):  atmos: LMDZ (NPv6, N96; 144 x 143 longitude/latitude; 79 levels; top level 40000 m) land: ORCHIDEE (v2.0, Water/Carbon/Energy mode) ocean: NEMO-OPA (eORCA1.3, tripolar primarily 1deg; 362 x 332 longitude/latitude; 75 levels; top grid cell 0-2 m) ocnBgchem: NEMO-PISCES seaIce: NEMO-LIM3" ;
                :source_id = "IPSL-CM6A-LR" ;
                :source_type = "AOGCM BGC" ;
                :sub_experiment_id = "none" ;
                :sub_experiment = "none" ;
                :table_id = "Ofx" ;
                :variable_id = "areacello" ;
                :variant_label = "r1i1p1f1" ;
                :EXPID = "piControl" ;
                :CMIP6_CV_version = "cv=6.2.3.5-2-g63b123e" ;
                :dr2xml_md5sum = "92ddb3d0d8ce79f498d792fc8e559dcf" ;
                :model_version = "6.1.6" ;
}

You might notice that there is an additional variable area in both files. I think this is an error, and that variable should not be present.

Hope this helps.

bjlittle commented 5 years ago

@zklaus Awesome, thanks. Just what I needed!

Yup, the area variable is rather odd?!

For tos_Omon_IPSL-CM6A-LR_piControl_r1i1p1f1_gn_185001-234912.nc the area variable is defined but not used i.e. it's not connected to the data variable tos directly or indirectly.

For areacello_Ofx_IPSL-CM6A-LR_piControl_r1i1p1f1_gn.nc, which defines the cell_measures variable areacello for tos, also has a cell_measures attribute, which also references area.

Hmmm that just seems wrong to me, but then again, I don't know that much about CMIP data.

Anyways, thanks @zklaus - your example is golden :smile:

zklaus commented 5 years ago

I agree with you that probably the area variable and all references to it should just be removed. Having said that, if in the tos file there was no external_variables attribute and instead area would be called areacello, than it would be an example of the second kind.

Also, as a clarification of how I think this should be usable with iris, in my view iris can not be expected to find files. Instead, the file containing the cell measure should be passed in the list of files given to load/load_cube and the cell measure taken from there if possible. That might mean that the external_variable attribute is irrelevant for iris.

pp-mo commented 5 years ago

pp-mo added the Feature: CF1.6/1.7 label now

Am I wrong to do this @bjlittle ? In my head, the label effectively addresses all the CF we don't support.

stephenworsley commented 2 years ago

@zklaus Is this still a relevant issue?

zklaus commented 2 years ago

@stephenworsley, sorry for the long delay. In principle, I would say yes, though I have not tested Iris' behaviour recently.

pp-mo commented 2 years ago

@stephenworsley, sorry for the long delay. In principle, I would say yes, though I have not tested Iris' behaviour recently.

I'm confident behaviour will not have changed in this respect ! We have an online catchup I think, Mon 12th Sep 2022 -- if this is a live issue for you, we should discuss it then

? might it also come into the upcoming workshop 18-20 Oct ?

zklaus commented 2 years ago

Thanks. I think it would still be the right thing to do, but it has lower priority for us now because we handle the addition of this kind of data separately.

github-actions[bot] commented 9 months ago

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

github-actions[bot] commented 8 months ago

This stale issue has been automatically closed due to a lack of community activity.

If you still care about this issue, then please either: