Closed ledm closed 4 years ago
I think that the absence of fx_files
is a significant mathematical problem that we will need to address in the meridional_statistics
function when using the mean
operator.
In the zonal direction, the weighting is less of a problem.
One thing I'm working on here is adding fx_files
to the recipe_ocean_example.yml
recipe - basically making it accurate!
When ESMValTool is unable to locate fx_files
, everything seems to continue as normal. @valeriupredoi, I would expect a fatal error to be given if an FX file was requested but not found. Is that not the case?
if the masking is done in the preprocessor, if fx files are not found, the code defaults to applying Natural Earth masks instead
It doesn't work like this in the area_statistics
& volume_statistics
functions. They too try to calculate the area or volume fields, but I feel that this is very inaccurate. We'd be better off if it fails when there's no fx_files. After all, area and volume descriptions are a CMIP5/6 requirement, right?
Should the error message be added in the get_input_fx_filelist
function of _data_finder.py
or in the area_statistics
& volume_statistics
functions?
So i guess the problem I'm seeing right now is: Why can't ESMValTool find the FX files anymore?
Another point is that none of the preprocessor tests include fx files. How can we write tests that include fx files?
Help @valeriupredoi! None of my recipes are able to find the fx_files anymore on my local desktop!
Can anyone else test a recipe that requires an fx_file to be loaded? @schlunma - have you had this problem?
@ledm I literally just ran a recipe with fx files grabbed from a local repo, all fine, are you using CMIP6? If so, there are missing input structures for fx files in config-developer
Which recipe?
Are you 100% sure that it did actually load and use the fx files? Several of the preprocessors have a fall-back option if they can't find the fx_files.
Also, in my case, I'm clearly seeing that ESMValTool can see the fx files, but the preprocessor can not find them!
well they are needed in the diagnostic so that'd fail if there was no fx file in the metadata file. I ran the recipe_autoassess_landurface_surfrad
recipe, which one are you running?
cheers, will test now with that
on the branch now, it finds the fx files and it says it's using them but I am running in this before anything else is done diagnostic-wise:
ValueError: Unknown preprocessor function 'zonal_statistics', choose from: download, fix_file,...
Yes, I'm working on that preprocessor in branch development_Zonal_Meridional_statistics_24
. However, you should be able to test the recipe by commenting out that diagnostic (and the other zonal & Meridional diagnostics.) Thanks!
If you want to look at the same Core branch, it's in development_Zonal_Meridional_statistics_24
.
So, my conclusion to this is that there's been a change in the way that recipes work. _Edit: Either that or fxfiles have never worked as I thought they did!
In order to get the recipe to work, I needed to move the fx_files: [areacello,]
line from the diagnostics section to the preprocessor section! This encounters the problem that the fx_files grid are not passed to the other preprocessors. Ie, if the preprocessor includes some operator like a extract_region
, the fx_file is not subjected to this function. This means that the fx data does not represent the preprocessed data anymore.
It doesn't work like this in the area_statistics & volume_statistics functions. They too try to calculate the area or volume fields, but I feel that this is very inaccurate. We'd be better off if it fails when there's no fx_files. After all, area and volume descriptions are a CMIP5/6 requirement, right?
What about observational datasets? The function should work for those too I think.
Another point is that none of the preprocessor tests include fx files. How can we write tests that include fx files?
Load a variable of interest + fx variable from file using iris, slice both cubes so they are small (i.e no more than a couple of points) and look at the data and coordinates. You can use this to create the cubes needed for the tests. Does that help?
Note that there are two ways in which the recipe uses fx_files: 1) Specified in the variable sections, this will just find fx files and pass their paths on to your diagnostic script. This is in the process of being changed #21, fx variables should be treated as any other variables. I see you are using this in your recipe. 2) Use by preprocessor functions: there is no need to mention fx files in the recipe at all, this is automatically filled in. The code that does this is here: https://github.com/ESMValGroup/ESMValCore/blob/development/esmvalcore/_recipe.py#L364-L422
It doesn't work like this in the area_statistics & volume_statistics functions. They too try to calculate the area or volume fields, but I feel that this is very inaccurate. We'd be better off if it fails when there's no fx_files. After all, area and volume descriptions are a CMIP5/6 requirement, right?
What about observational datasets? The function should work for those too I think.
How should we differentiate in the preprocessor whether a preprocessor requires an fx file or not? I think it should fail for model data when no FX file is available. Especially when the model uses irregular grids. There's an argument that regular grids are easier for Iris to calculate - however Bill Little mentioned that the iris.analysis.cartography.area_weights
calculator might be removed in the near future anyway!
Another point is that none of the preprocessor tests include fx files. How can we write tests that include fx files?
Load a variable of interest + fx variable from file using iris, slice both cubes so they are small (i.e no more than a couple of points) and look at the data and coordinates. You can use this to create the cubes needed for the tests. Does that help?
Sort of. Will the new sliced data and fx file need to be added to the git repository in testing data or something?
Note that there are two ways in which the recipe uses fx_files:
- Specified in the variable sections, this will just find fx files and pass their paths on to your diagnostic script. This is in the process of being changed #21, fx variables should be treated as any other variables. I see you are using this in your recipe.
- Use by preprocessor functions: there is no need to mention fx files in the recipe at all, this is automatically filled in. The code that does this is here: https://github.com/ESMValGroup/ESMValCore/blob/development/esmvalcore/_recipe.py#L364-L422
I just spoke with V and I think it would be worth shelving this part of the work until #21 has completed.
How should we differentiate in the preprocessor whether a preprocessor requires an fx file or not? I think it should fail for model data when no FX file is available.
I think this is something we could implement in _recipe.py, because we know what dataset we are building a preprocessor for there. Can you make a separate issue for it?
Will the new sliced data and fx file need to be added to the git repository in testing data or something?
I would just create the iris cube in code and save it to a temporary file if you need to read it from file for your test.
Created issue #103.
@ledm your mysteriously vanishing fx files are behaving like this because you need to specify the fx_files
key argument under each of those _statistics
preprocesor calls in the recipe and not in the variable: the code in _recipe.py
reads:
for step in ('area_statistics', 'volume_statistics', 'zonal_statistics',
'meridional_statistics'):
if settings.get(step, {}).get('fx_files'):
settings[step]['fx_files'] = get_input_fx_filelist(
variable=variable,
rootpath=config_user['rootpath'],
drs=config_user['drs'],
)
the if settings.get(step, {}).get('fx_files')
is null if fx files are not in the step's setting. Also note that whoever coded this up here (@bouweandela ?) forgot to add the needed
variable['fx_files'] = settings.get(step, {}).get('fx_files')
so that the block reads:
for step in ('area_statistics', 'volume_statistics', 'zonal_statistics',
'meridional_statistics'):
if settings.get(step, {}).get('fx_files'):
variable['fx_files'] = settings.get(step, {}).get('fx_files')
settings[step]['fx_files'] = get_input_fx_filelist(
variable=variable,
rootpath=config_user['rootpath'],
drs=config_user['drs'],
)
with that in all you fx ducks are in line (if the files actually exist on ESGF or your local DB).
There is however, some bad coding in the stats functions themselves because I get this (after some running):
ValueError: Fx area (216, 360) and dataset (48, 196, 111) shapes do not match.
which suggests to me that you are applying a 2d mask on 3d data :beer:
Thanks V!
Did we always have to specify the fx_file in the preprocessor - or is that a new thing? So far, I've always been specifying it in the diagnostics. Perhaps my recipes have been wrong all along (There was no documentation or examples at the time so I refuse to feel too guilty about that!)
I can't tell for sure whats happening with your shapes not matching - as I can't tell which preprocessor/diagnostic you're running. However, this may be a problem if the grid fx does not receive the same preprocessing as the dataset. For instance, if the extract_region preprocessor reduces the grid from (216, 360)
to (196, 111)
, then we can no longer use the same areacello
to match it!
Perhaps the solution is for extract_region
to apply a mask instead of change the shape of the dataset?
I dunno, may be a mistake in my recipe, I added fx_files: [areacello, ]
everywhere under each of the _statistic
functions for testing, maybe I should have been more careful :grin:
No, there was no documentation and in fact it is the first time I see myself the addition of fx_files:
in the preprocessor, quite a mystery on our hands :world_map:
well actually here's the bugger:
custom_order: true
extract_levels:
levels: [0., 10., 100., 1000.]
scheme: linear_horizontal_extrapolate_vertical
extract_region:
start_longitude: -80.
end_longitude: 30.
start_latitude: -80.
end_latitude: 80.
area_statistics:
operator: mean
fx_files: [areacello,
you are applying a (216, 360) mask on a (48, 10, 96, 21) cube and the catcher in the area_statistic
is
if grid_areas.shape != cube.shape[-2:]:
raise ValueError('Fx area {} and dataset {} shapes do not match.'
''.format(grid_areas.shape, cube.shape))
so yeah, you need to extract the area and levels the same way for the fx mask as for the data
Exactly, that's what I've been saying!
The preprocessor needs to be applied to the fx files as well as the actual dataset. This doesn't happen if the fx_files
in the preprocessor section of the recipe. If they're given at the diagnostic section of the recipe, the area_statistics
preprocessor doesn't receive them!
Also note that whoever coded this up here (@bouweandela ?) forgot to add the needed
Use git blame
if you want to find out who wrote something, e.g.
$ git blame esmvalcore/_recipe.py | grep -A6 area_statistics
18f5d3882 esmvaltool/_recipe.py (Manuel Schlund 2019-06-05 15:00:01 +0200 416) for step in ('area_statistics', 'volume_statistics'):
baa4fb2f4 esmvaltool/_recipe.py (Manuel Schlund 2019-02-08 12:23:00 +0100 417) if settings.get(step, {}).get('fx_files'):
8267a24c3 esmvaltool/_recipe.py (Lee de Mora 2019-01-29 11:07:32 +0000 418) settings[step]['fx_files'] = get_input_fx_filelist(
52f0a8d74 esmvaltool/_recipe.py (Manuel Schlund 2019-02-08 12:20:16 +0100 419) variable=variable,
61e642c36 esmvaltool/_recipe.py (Lee de Mora 2019-01-23 14:31:21 +0000 420) rootpath=config_user['rootpath'],
8267a24c3 esmvaltool/_recipe.py (Lee de Mora 2019-01-29 11:07:32 +0000 421) drs=config_user['drs'],
8267a24c3 esmvaltool/_recipe.py (Lee de Mora 2019-01-29 11:07:32 +0000 422) )
ah I shed a tear everytime I see git blame
:grin:
I'm back to work on this issue, starting from @valeriupredoi's new PR #170.
I'm slowing coming to the conclusion that we can not sensibly produce zonal or meridional statistics for irregular grids.
This is because iris can't apply iris.analysis functions along one dimension of an irregular grid. It simply does not work at the moment. The iris.analysis functions can only be applied along the x-y axis of an array, not along an arbitrary latitude
-like axis of an irregular grid. (I believe that this is a fairly permanent problem and this functionality won't be ready until Iris 3 is ready in 2023 or similar!)
This means that if we want to calculate zonal or meridional statistics for irregular grids, we need to regrid the data to a regular grid, then apply the zonal or meridional statistics to that regular grid. This requires ESMValTool to regrid the fx_files (areacello
or volcello
), or recalculate them from the new grid. Neither option is particularly precise. Furthermore, the regrid function does not currently apply to the fx_files at the moment (see this comment. )
So, I propose that we do not support zonal or meridional statistics for irregular grids. Any thoughts?
fix in #214 now :beer:
Nope! these changes are obsolete, all is done up in development
already! Yay :tada:
I'm not sure that this issue was ever resolved. As @mattiarighi just pointed out to me in an email, the zonal_statisctics
and meridional_statistics
preprocessors are included in the ESMValTool v2 paper, so we need to get them in.
Apparently it's just about renaming zonal_means
to zonal_statistics
.
I'm doing this. Will submit in a second.
To summarise where this issue got to, we encountered 3 main problems with zonal and meridional statistics:
Looks like there's been progres with issues 1 and 3 here, but what about issue 2?
zonal_means
seems to be used successfully in two recipes: recipe_collins13ipcc.yml
and recipe_flato13ipcc
. So, the functionality is there, it is just a matter of naming.
For consistency with the other _statistics
, we should rename it to zonal_statistics
.
This is an extension of Issue #1117 and PR #1123.
The following jobs need to be completed for the documentation paper (Table 1).
zonal_means
preprocesor function into two functions:zonal_statistics
andmeridional_statistics
.mean_type
tooperator
, like in the newarea_statistics
fx_files
.get_iris_analysis_operation
function in a new file,_shared.py
in the preprocesor folder.volume_statistics
function, and add documentation to the header of that function.