ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Make preprocessor dictionary available to the diagnostic script #70

Closed mattiarighi closed 4 years ago

mattiarighi commented 6 years ago

The preprocessor settings (levels, target_grid, etc.) used by a given diagnostic shall be made available to the diagnostic script itself via the temporary file (ncl.interface for NCL diagnostics).

mattiarighi commented 6 years ago

Given the solutions proposed in PR #181, would it be reasonable to put the preprocessor dictionary in a preproc_info file, similar to diag_script_info and variable_info?

bouweandela commented 6 years ago

Sure, that is possible. What is the use of case of giving this information to the diagnostic? I think it might be good to not share too much unnecessary information, to avoid getting a tightly coupled system.

mattiarighi commented 6 years ago

Information such as the target level are often used in the diags to name output files or for labelling purposes on the plot.

bouweandela commented 6 years ago

The target pressure level(s) can simply be read from the file containing the preprocessed data. Could you give more examples? I'm still not convinced that this is a good idea, because it makes it much harder to change small things in the preprocessor, e.g. if we would want to rename the preprocessor 'levels' to 'pressure_levels' at some point, because we think that is more clear, all diagnostics that depend on using this variable would break.

mattiarighi commented 6 years ago

In general, all the information contained in the preprocessor dictionary could be useful in the diagnostics for labelling and output filenaming purposes.

Another example in addition to the one above could be a diagnostic comparing different regridding methods, which would need to display this information on the plots.

I understand your concern, but since the dictionary keys would be uniquely identified by a preproc_info variable in the diag scripts, a search-and-replace preproc_info@oldname --> preproc_info@newname should not be big issue.

We could also wait until the backend is finalized before addressing this, to reduce the probability of further changes in the preproc dictionary.

axel-lauer commented 6 years ago

I can only second Mattia, all preprocessor settings need to be made available to the diagnostics. This is required not only for putting meaningful labels and titles on the plots but in particular for keeping the provenance standards we introduced in v1.1. These include, for instance, detailed figure captions where information on regridding, masking, etc. is required. I do not see any problems in terms of a "too tight coupling" since this would be simply passing through more detailed meta-data to the diagnostics than what is currently already done.

bouweandela commented 6 years ago

We have an open issue about implementing provenance: issue #240, as part of the solution to that issue, provenance information (including all settings used to preprocess the data) will be supplied to diagnostic scripts.

nielsdrost commented 6 years ago

I'm not sure we will pass provenance info to the diagnostics in #240.

But agree with @mattiarighi and @axel-lauer that passing preprocec settings is probably simplest solution for this issue. Downside is dependency on preproc dictionary.

Would it make sense to access the dict with a function so it is easy to spot any occurrences of access?

LisaBock commented 5 years ago

As we use the version 2 for creating figures for the next IPCC AR6 WGI draft, it is a very important to solve this issue as soon as possible. We need, for example, the perfmetrics plot, which has entries for the variable ta in different levels. At the moment it is not possible to distinguish between them in the plot.

bouweandela commented 5 years ago

I do not think that I will have time to implement this within the next month.

However, a reliable solution for this pressure level needed in perfmetrics issue would be the following: The extract_levels function keeps the selected level as an auxiliary coordinate so it can be read from the preprocessed netcdf file. E.g.:

$ ncdump -v plev recipe_python_20181210_134745/preproc/diagnostic1_preprocessor1_ta/CMIP5_bcc-csm1-1_Amon_historical_r1i1p1_T3M_ta_2000-2002.nc
netcdf CMIP5_bcc-csm1-1_Amon_historical_r1i1p1_T3M_ta_2000-2002 {
dimensions:
    time = UNLIMITED ; // (36 currently)
    lat = 64 ;
    lon = 128 ;
    bnds = 2 ;
variables:
    float ta(time, lat, lon) ;
        ta:standard_name = "air_temperature" ;
        ta:long_name = "Air Temperature" ;
        ta:units = "K" ;
        ta:cell_methods = "time: mean (interval: 20 mintues)" ;
        ta:coordinates = "day_of_month day_of_year month_number plev year" ;
    double time(time) ;
        time:axis = "T" ;
        time:bounds = "time_bnds" ;
        time:units = "day since 1950-01-01 00:00:00.0000000" ;
        time:standard_name = "time" ;
        time:long_name = "time" ;
        time:calendar = "365_day" ;
    double time_bnds(time, bnds) ;
    double lat(lat) ;
        lat:axis = "Y" ;
        lat:bounds = "lat_bnds" ;
        lat:units = "degrees_north" ;
        lat:standard_name = "latitude" ;
        lat:long_name = "latitude" ;
    double lat_bnds(lat, bnds) ;
    double lon(lon) ;
        lon:axis = "X" ;
        lon:bounds = "lon_bnds" ;
        lon:units = "degrees_east" ;
        lon:standard_name = "longitude" ;
        lon:long_name = "longitude" ;
    double lon_bnds(lon, bnds) ;
    double plev ;
        plev:units = "Pa" ;
        plev:standard_name = "air_pressure" ;
        plev:long_name = "pressure" ;
        plev:positive = "down" ;
    int64 day_of_month(time) ;
        day_of_month:units = "1" ;
        day_of_month:long_name = "day_of_month" ;
    int64 day_of_year(time) ;
        day_of_year:units = "1" ;
        day_of_year:long_name = "day_of_year" ;
    int64 month_number(time) ;
        month_number:units = "1" ;
        month_number:long_name = "month_number" ;
    int64 year(time) ;
        year:units = "1" ;
        year:long_name = "year" ;

// global attributes:
        :associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_bcc-csm1-1_historical_r0i0p0.nc areacella: areacella_fx_bcc-csm1-1_historical_r0i0p0.nc" ;
        :branch_time = 470. ;
        :cmor_version = "2.5.6" ;
        :comment = "The experiment starts from piControl run at year 470. RCP8.5 scenario forcing data are used beyond year 2005." ;
        :contact = "Dr. Tongwen Wu (twwu@cma.gov.cn)" ;
        :experiment = "historical" ;
        :experiment_id = "historical" ;
        :forcing = "Nat Ant GHG SD Oz Sl Vl SS Ds BC OC" ;
        :frequency = "mon" ;
        :initialization_method = 1 ;
        :institute_id = "BCC" ;
        :institution = "Beijing Climate Center(BCC),China Meteorological Administration,China" ;
        :metadata = "cmor_table: CMIP5\ndataset: bcc-csm1-1\ndiagnostic: diagnostic1\nend_year: 2002\nensemble: r1i1p1\nexp: historical\nfield: T3M\nfilename: /home/bandela/esmvaltool_output/recipe_python_20181210_134745/preproc/diagnostic1_preprocessor1_ta/CMIP5_bcc-csm1-1_Amon_historical_r1i1p1_T3M_ta_2000-2002.nc\nfrequency: mon\ninstitute: [BCC]\nlong_name: Air Temperature\nmip: Amon\nmodeling_realm: [atmos]\npreprocessor: preprocessor1\nproject: CMIP5\nreference_dataset: bcc-csm1-1\nshort_name: ta\nstandard_name: air_temperature\nstart_year: 2000\nunits: K\n" ;
        :model_id = "bcc-csm1-1" ;
        :modeling_realm = "atmos" ;
        :original_name = "T" ;
        :parent_experiment = "pre-industrial control" ;
        :parent_experiment_id = "piControl" ;
        :parent_experiment_rip = "r1i1p1" ;
        :physics_version = 1 ;
        :product = "output" ;
        :project_id = "CMIP5" ;
        :realization = 1 ;
        :source = "bcc-csm1-1:atmosphere:  BCC_AGCM2.1 (T42L26); land: BCC_AVIM1.0;ocean: MOM4_L40 (tripolar, 1 lon x (1-1/3) lat, L40);sea ice: SIS (tripolar,1 lon x (1-1/3) lat)" ;
        :table_id = "Table Amon (11 April 2011) 1cfdc7322cf2f4a32614826fab42c1ab" ;
        :title = "bcc-csm1-1 model output prepared for CMIP5 historical" ;
        :Conventions = "CF-1.5" ;
data:

 plev = 85000 ;
}
valeriupredoi commented 5 years ago

done in #174 - it adds the preprocessor profile in metadata.yml

mattiarighi commented 4 years ago

Closing since not required anymore.