Closed schlunma closed 3 years ago
One additional thought: For some fx preprocessors we automatically add fx files if none are given (masking and land fraction weighting), for some we don't do this (area and volume statistics):
Would it make sense to unify this?
I think a good middle point solution would be to raise a warning instead of an exception during the broadcasting when adding the variables: if the shapes do not match, the variable will not be added into the cube. I think this should make both versions of the recipe work.
One additional thought: For some fx preprocessors we automatically add fx files if none are given (masking and land fraction weighting), for some we don't do this (area and volume statistics)
I don't know, this was the old behaviour as well. Not sure what was the reason for that.
I think a good middle point solution would be to raise a warning instead of an exception during the broadcasting when adding the variables: if the shapes do not match, the variable will not be added into the cube. I think this should make both versions of the recipe work.
Yes, fully agree! But maybe only raise a warning if no fx files could be added, so that we don't get unnecessary warnings if sftof
cannot be added for an atmospheric variables but sftlf
is perfectly fine.
One additional thought: For some fx preprocessors we automatically add fx files if none are given (masking and land fraction weighting), for some we don't do this (area and volume statistics)
I don't know, this was the old behaviour as well. Not sure what was the reason for that.
Right. Any opinions on this @ESMValGroup/esmvaltool-coreteam? I think it would make sense to add them as default. This shouldn't affect any recipes since if there are issues with the fx variables (no dataset available, wrong shape) they will be ignored and the tool will fall back to the current calculation of the preprocessors.
One more thing: I just realized that the one test is failing since #999: https://github.com/ESMValGroup/ESMValCore/runs/2509144545?check_suite_focus=true#step:7:1581 This shouldn't be hard to fix though.
One more thing: I just realized that the one test is failing since #999: https://github.com/ESMValGroup/ESMValCore/runs/2509144545?check_suite_focus=true#step:7:1581 This shouldn't be hard to fix though.
I don't know. I have this error locally but it is not detected by the CI (see #1099 as an example)
I think it would make sense to add them as default. This shouldn't affect any recipes since if there are issues with the fx variables (no dataset available, wrong shape) they will be ignored and the tool will fall back to the current calculation of the preprocessors.
I think that makes sense, especially for the volume_statistics
step because there is only one option for the volume variable (volcello
).
The only thing about this approach that is a bit inconvenient is that it would load and check fx_variables that are not needed, and if those variables do not pass the checks it could become a bit confusing ("Why is my recipe crashing because areacella does not pass the checks, if I only need areacello for my ocean variables"). But that can already happen for the mask_landsea
step anyway.
maybe add an extra option a la use_other_available_fx_variables: True
in recipe? We do want to use the NE masks, as it was before, and as @schlunma well points out, in case none of the fx files are available, but not for certain cases (like sftgif
or volcello
) :beer:
Maybe something like checking the mip of the variable? If it's related to an ocean realm it should start with O (and SI for the ice), and areacello and sftof could be set as the default options. And if that is not the case, set areacella and sftlf as the default ones.
I wouldn't make this dependent on the variable. Sometimes it might be useful when an atmospheric variables has access to e.g. sftof
, for example when they share the grid and no sftlf
is available.
The only thing about this approach that is a bit inconvenient is that it would load and check fx_variables that are not needed, and if those variables do not pass the checks it could become a bit confusing ("Why is my recipe crashing because areacella does not pass the checks, if I only need areacello for my ocean variables"). But that can already happen for the
mask_landsea
step anyway.
You could add a try ... except CMORCheckError as exc ... raise CMORCheckError("...") from exc
to the checks/fixes here and add a more detailed error description that contains the preprocessor that originally added the fx variable and a note that this error can be avoided by explicitly specifying an fx variable.
I think the easiest thing would be to check the shape after loading but before cmor checking. If the fx var cannot be broadcasted to the cube, discard the fx_variable and try the next one. I'll finish this next week though.
@sloosvel @schlunma I would like to address also another inconsistency in preprocessors within _time.py using the aggregated_by
iris function that do not preserve the fx
field as cell_measures
.
A similar issue occurs also in _vertical_interpolate
function that returns a new cube using _create_cube
that is not handling the presence of cell_measures
in the source cube.
As I understood from iris documentation cell_measures are not yet preserved in aggregated_by
operation , so I thought about a workaround function, like the following
def aggregated_by(cube, coords, operator):
"""Compute iris aggregation preserving cell_measures.
Parameters
----------
cube: iris.cube.Cube
input cube.
coords: list of coord names
Coordinate(s) over which group aggregation is to be performed.
operator: str
Select operator to apply.
Returns
-------
iris.cube.Cube
Cube aggregated upon operator
"""
from ._ancillary_vars import add_fx_variables
if cube.cell_measures():
for measure in cube.cell_measures():
fx_variables = cube.cell_measure(
measure.standard_name).attributes['fx_variables']
cube = cube.aggregated_by(coords, operator)
cube = add_fx_variables(cube, fx_variables, CheckLevels.DEFAULT)
else:
cube = cube.aggregated_by(coords, operator)
return cube
which requires the following change to _ancillary_vars.py
to keep track of the original fx
input file(s)
@@ -163,6 +163,8 @@ def add_fx_variables(cube, fx_variables, check_level):
if isinstance(fx_info['filename'], str):
fx_info['filename'] = [fx_info['filename']]
fx_cube = _load_fx(fx_info, check_level)
+ # store fx_variables for reuse
+ fx_cube.attributes['fx_variables'] = fx_variables
measure_name = {
'areacella': 'area',
Thanks for poiting that out @tomaslovato. Since some fx vars can be time dependant, would it make sense to also perform the time statistics on them? Maybe those preprocessor steps could check the presence of cell measures and work on both the var cube and the measure cube, and then re-add the measures at the end. Instead of working with the original files again.
@tomaslovato Unfortunately I think it's not that easy. The shape of the aggregated cube does (most likely) not match the shape of the fx files, so they cannot simply be added.
It does also not make sense to simply perform the same statistic that is used on the data on the fx files. For example, if you compute the sum of the data over time, it does not make sense to calculate the sum of the cell areas over time. I pretty sure that if it was that simple, iris
would have implemented that.
A possible solution for time independent fx variables would be to specify only the relevant dimensions in add_cell_measure
or add_ancilliary_variable
that the fx variable actually cover (e.g. latitude
and longitude
) without broadcasting them before. If a statistic is calculated over a dimension that is not covered by the fx variables (e.g. time
), aggregated_by
and collapsed
happily preserve the cell measures/ancialliary vars:
However, this would involve quite some changes in the code, since the broadcasting has to be done in other preprocesser functions.
For time-dependent fx variables the situation is more difficult. One approach could be to always use mean
for the statistics of the fx variables, but I pretty sure this is not correct in general.
@schlunma thanks for your feedback! I may guess that when I did the test with the time statistics my version of iris was not up to date ... as I tried again and cell_measures are preserved.
However the issue is still valid for _vertical_interpolate
that creates a new cube and cell_measures
are not accounted for.
In some case by reordering the preprocessor we can work around this problem, but I'm not sure it will always be possible.
I found some small issues with the new implementation of the fx preprocessors in #999 (great job btw there @sloosvel!).
Issues
Not directly related to #999, but do we have a list of all preprocessors for which
fx_variables
can be specified? Maybe I'm blind, but I didn't find it in the documentation. Basically the list given here: https://github.com/ESMValGroup/ESMValCore/blob/fee5d541dc871efe72ab972a6a52195c452d16eb/esmvalcore/_recipe.py#L482-L485For
mask_landsea
we had checks for the shape of the fx variables in place. If the shapes did not match, Natural Earth masks were used. Right now, if the shapes do not match, an error is printed as reported by @LisaBock here:ValueError: Dimensions of tas and sftof cubes do not match. Cannot broadcast cubes.
This is in particular problematic since formask_landsea
, the tool automatically adds the fx variablessftlf
andsftof
if none is specified. Sincesftlf
is always checked first, the tool is most likely to fail if an ocean variable is specified since the shapes ofsfltf
and ocean variables differ for most models. For example, the first recipe fails with the errorValueError: Dimensions of tos and sftlf cubes do not match. Cannot broadcast cubes.
, and the second one runs fine:Recipe 1 (fails)
```yml preprocessors: test: mask_landsea: mask_out: land diagnostics: test: variables: tos: preprocessor: test project: CMIP5 mip: Omon exp: historical start_year: 2000 end_year: 2000 ensemble: r1i1p1 additional_datasets: - {dataset: CanESM2} scripts: null ```Recipe 2 (works)
```yml preprocessors: test: mask_landsea: mask_out: land fx_variables: sftof: diagnostics: test: variables: tos: preprocessor: test project: CMIP5 mip: Omon exp: historical start_year: 2000 end_year: 2000 ensemble: r1i1p1 additional_datasets: - {dataset: CanESM2} scripts: null ```For
weighting_landsea_fraction
, we had similar checks. The tool triedsftlf
andsftof
, and only if both had different shapes, an error was printed. Right now, due to the prioritization ofsftlf
, an error is e.g. printed for every ocean variables that is not on an atmospheric grid (thus does not match the shape ofsftlf
), even thoughsftof
would work just fine.As far as I can tell from the code (didn't test though!) the remaining preprocessors should behave similarly to the old version (the error messages might differ, though).
The following message is now printed to the log. I think we should demote this to a debug message and maybe only print the
short_name
of the fx variables as info message.Log
``` 2021-05-05 08:28:29,557 UTC [14161] INFO Using fx_files: {'sftlf': {'alias': 'CanESM2', 'dataset': 'CanESM2', 'diagnostic': 'test', 'end_year': 2000, 'ensemble': 'r0i0p0', 'exp': 'historical', 'filename': '/pf/b/b309141/work/CMIP5_DKRZ/CCCma/CanESM2/historical/fx/atmos/fx/r0i0p0/v20120410/sftlf/sftlf_fx_CanESM2_historical_r0i0p0.nc', 'frequency': 'fx', 'institute': ['CCCma'], 'long_name': 'Land Area Fraction', 'mip': 'fx', 'modeling_realm': ['atmos'], 'original_short_name': 'sftlf', 'preprocessor': 'test', 'project': 'CMIP5', 'recipe_dataset_index': 0, 'short_name': 'sftlf', 'standard_name': 'land_area_fraction', 'start_year': 2000, 'units': '%', 'variable_group': 'sftlf'}, 'sftof': {'alias': 'CanESM2', 'dataset': 'CanESM2', 'diagnostic': 'test', 'end_year': 2000, 'ensemble': 'r0i0p0', 'exp': 'historical', 'filename': '/pf/b/b309141/work/CMIP5_DKRZ/CCCma/CanESM2/historical/fx/ocean/fx/r0i0p0/v20130119/sftof/sftof_fx_CanESM2_historical_r0i0p0.nc', 'frequency': 'fx', 'institute': ['CCCma'], 'long_name': 'Sea Area Fraction', 'mip': 'fx', 'modeling_realm': ['ocean'], 'original_short_name': 'sftof', 'preprocessor': 'test', 'project': 'CMIP5', 'recipe_dataset_index': 0, 'short_name': 'sftof', 'standard_name': 'sea_area_fraction', 'start_year': 2000, 'units': '%', 'variable_group': 'sftof'}} for variable tos during step mask_landsea ```Possible solution
The main issues here are 2. and 3. I think both can be fixed if we don't raise an exception here and here if the shapes of the ancilliary files do not match the cube. If a preprocessor cannot handle the case that no fx files are found/the shapes differ, the errors should be raised in the corresponding preprocessing function.
It would be also nice to add some tests for these cases, similiar to the ones we had before.