Closed mattiarighi closed 4 years ago
Given the solutions proposed in PR #181, would it be reasonable to put the preprocessor dictionary in a preproc_info
file, similar to diag_script_info
and variable_info
?
Sure, that is possible. What is the use of case of giving this information to the diagnostic? I think it might be good to not share too much unnecessary information, to avoid getting a tightly coupled system.
Information such as the target level are often used in the diags to name output files or for labelling purposes on the plot.
The target pressure level(s) can simply be read from the file containing the preprocessed data. Could you give more examples? I'm still not convinced that this is a good idea, because it makes it much harder to change small things in the preprocessor, e.g. if we would want to rename the preprocessor 'levels' to 'pressure_levels' at some point, because we think that is more clear, all diagnostics that depend on using this variable would break.
In general, all the information contained in the preprocessor dictionary could be useful in the diagnostics for labelling and output filenaming purposes.
Another example in addition to the one above could be a diagnostic comparing different regridding methods, which would need to display this information on the plots.
I understand your concern, but since the dictionary keys would be uniquely identified by a preproc_info
variable in the diag scripts, a search-and-replace preproc_info@oldname
--> preproc_info@newname
should not be big issue.
We could also wait until the backend is finalized before addressing this, to reduce the probability of further changes in the preproc dictionary.
I can only second Mattia, all preprocessor settings need to be made available to the diagnostics. This is required not only for putting meaningful labels and titles on the plots but in particular for keeping the provenance standards we introduced in v1.1. These include, for instance, detailed figure captions where information on regridding, masking, etc. is required. I do not see any problems in terms of a "too tight coupling" since this would be simply passing through more detailed meta-data to the diagnostics than what is currently already done.
We have an open issue about implementing provenance: issue #240, as part of the solution to that issue, provenance information (including all settings used to preprocess the data) will be supplied to diagnostic scripts.
I'm not sure we will pass provenance info to the diagnostics in #240.
But agree with @mattiarighi and @axel-lauer that passing preprocec settings is probably simplest solution for this issue. Downside is dependency on preproc dictionary.
Would it make sense to access the dict with a function so it is easy to spot any occurrences of access?
As we use the version 2 for creating figures for the next IPCC AR6 WGI draft, it is a very important to solve this issue as soon as possible. We need, for example, the perfmetrics plot, which has entries for the variable ta in different levels. At the moment it is not possible to distinguish between them in the plot.
I do not think that I will have time to implement this within the next month.
However, a reliable solution for this pressure level needed in perfmetrics issue would be the following: The extract_levels function keeps the selected level as an auxiliary coordinate so it can be read from the preprocessed netcdf file. E.g.:
$ ncdump -v plev recipe_python_20181210_134745/preproc/diagnostic1_preprocessor1_ta/CMIP5_bcc-csm1-1_Amon_historical_r1i1p1_T3M_ta_2000-2002.nc
netcdf CMIP5_bcc-csm1-1_Amon_historical_r1i1p1_T3M_ta_2000-2002 {
dimensions:
time = UNLIMITED ; // (36 currently)
lat = 64 ;
lon = 128 ;
bnds = 2 ;
variables:
float ta(time, lat, lon) ;
ta:standard_name = "air_temperature" ;
ta:long_name = "Air Temperature" ;
ta:units = "K" ;
ta:cell_methods = "time: mean (interval: 20 mintues)" ;
ta:coordinates = "day_of_month day_of_year month_number plev year" ;
double time(time) ;
time:axis = "T" ;
time:bounds = "time_bnds" ;
time:units = "day since 1950-01-01 00:00:00.0000000" ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:calendar = "365_day" ;
double time_bnds(time, bnds) ;
double lat(lat) ;
lat:axis = "Y" ;
lat:bounds = "lat_bnds" ;
lat:units = "degrees_north" ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
double lat_bnds(lat, bnds) ;
double lon(lon) ;
lon:axis = "X" ;
lon:bounds = "lon_bnds" ;
lon:units = "degrees_east" ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
double lon_bnds(lon, bnds) ;
double plev ;
plev:units = "Pa" ;
plev:standard_name = "air_pressure" ;
plev:long_name = "pressure" ;
plev:positive = "down" ;
int64 day_of_month(time) ;
day_of_month:units = "1" ;
day_of_month:long_name = "day_of_month" ;
int64 day_of_year(time) ;
day_of_year:units = "1" ;
day_of_year:long_name = "day_of_year" ;
int64 month_number(time) ;
month_number:units = "1" ;
month_number:long_name = "month_number" ;
int64 year(time) ;
year:units = "1" ;
year:long_name = "year" ;
// global attributes:
:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_bcc-csm1-1_historical_r0i0p0.nc areacella: areacella_fx_bcc-csm1-1_historical_r0i0p0.nc" ;
:branch_time = 470. ;
:cmor_version = "2.5.6" ;
:comment = "The experiment starts from piControl run at year 470. RCP8.5 scenario forcing data are used beyond year 2005." ;
:contact = "Dr. Tongwen Wu (twwu@cma.gov.cn)" ;
:experiment = "historical" ;
:experiment_id = "historical" ;
:forcing = "Nat Ant GHG SD Oz Sl Vl SS Ds BC OC" ;
:frequency = "mon" ;
:initialization_method = 1 ;
:institute_id = "BCC" ;
:institution = "Beijing Climate Center(BCC),China Meteorological Administration,China" ;
:metadata = "cmor_table: CMIP5\ndataset: bcc-csm1-1\ndiagnostic: diagnostic1\nend_year: 2002\nensemble: r1i1p1\nexp: historical\nfield: T3M\nfilename: /home/bandela/esmvaltool_output/recipe_python_20181210_134745/preproc/diagnostic1_preprocessor1_ta/CMIP5_bcc-csm1-1_Amon_historical_r1i1p1_T3M_ta_2000-2002.nc\nfrequency: mon\ninstitute: [BCC]\nlong_name: Air Temperature\nmip: Amon\nmodeling_realm: [atmos]\npreprocessor: preprocessor1\nproject: CMIP5\nreference_dataset: bcc-csm1-1\nshort_name: ta\nstandard_name: air_temperature\nstart_year: 2000\nunits: K\n" ;
:model_id = "bcc-csm1-1" ;
:modeling_realm = "atmos" ;
:original_name = "T" ;
:parent_experiment = "pre-industrial control" ;
:parent_experiment_id = "piControl" ;
:parent_experiment_rip = "r1i1p1" ;
:physics_version = 1 ;
:product = "output" ;
:project_id = "CMIP5" ;
:realization = 1 ;
:source = "bcc-csm1-1:atmosphere: BCC_AGCM2.1 (T42L26); land: BCC_AVIM1.0;ocean: MOM4_L40 (tripolar, 1 lon x (1-1/3) lat, L40);sea ice: SIS (tripolar,1 lon x (1-1/3) lat)" ;
:table_id = "Table Amon (11 April 2011) 1cfdc7322cf2f4a32614826fab42c1ab" ;
:title = "bcc-csm1-1 model output prepared for CMIP5 historical" ;
:Conventions = "CF-1.5" ;
data:
plev = 85000 ;
}
done in #174 - it adds the preprocessor profile in metadata.yml
Closing since not required anymore.
The preprocessor settings (
levels
,target_grid
, etc.) used by a given diagnostic shall be made available to the diagnostic script itself via the temporary file (ncl.interface
for NCL diagnostics).