ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Using derived ERA5 data #1050

Closed lukasbrunner closed 3 years ago

lukasbrunner commented 3 years ago

Question for the community:

I'm currently developing a recipe which is comparing the CMIP6 variable rsus (long_name: Surface Upwelling Shortwave Radiation) with data from ERA5. This variable does not exist in ERA5 so in a first step I did: cdo setpartabn,parameter_table -sub surface_solar_radiation_downwards/era5_surface_solar_radiation_downwards_1979_monthly.nc surface_net_solar_radiation/era5_surface_net_solar_radiation_1979_monthly.nc surface_solar_radiation_upwards/era5_surface_solar_radiation_upwards_1979_monthly.nc (where setpartabn renames the variable) This did not work because the preprocessor could not make units fit (ERA5 is in J m-2 and CMIP6 is in W m-2). NOTE: for the native ERA5 radiation files (like surface_solar_radiation_downwards and the CMIP6 equivalent rsds) this conversion works so I'm assuming this could be added somewhere.

Question 1: Is it possible to make the preprocessor aware of new ERA5 variables like in this case?

For now I have solved this problem by converting the unit offline like: cdo -b F32 setpartabn,parameter_table -divc,86400 -sub surface_solar_radiation_downwards/era5_surface_solar_radiation_downwards_1979_monthly.nc surface_net_solar_radiation/era5_surface_net_solar_radiation_1979_monthly.nc surface_solar_radiation_upwards/era5_surface_solar_radiation_upwards_1979_monthly.nc (where setpartabn renames the variable and now also the unit) This works but I wonder if this is the way it should be done...

Question 2: Is there a protocol to add new observations to ESMValTool and how does the code check (ESMValBot) work in this case? Is there need/way to upload the data as well?

stefsmeets commented 3 years ago

Hi @lukasbrunner , good question! The ERA5 variables are fixed on the fly using the native6 format. The code for this is here: https://github.com/ESMValGroup/ESMValCore/blob/master/esmvalcore/cmor/_fixes/native6/era5.py

I see that the variable rsus is not available, so that file will need to be updated with the unit conversion that you describe, but this is probably not what we would want to do, if ERA5 does not include this variable.

If I understand correctly, you want to combine two different variables from two different datasets. Can you not handle this in the diagnostics?

lukasbrunner commented 3 years ago

Hi @stefsmeets

not quite I want to combine two variables from the same dataset - ERA5. ERA5 provides downwards & net radiation but NOT upwards radiation, while CMIP provides downwards and upwards. I would like to compare the upwards components. So doing it in the diagnostic is a way I was just wondering if it is the preferred way for this case where the variable is in the standard CMIP set. I would prefer not to touch the diagnostic and work in the recipe but if it is the way to go to calculate this in the diagnostic that is also fine.

lukasbrunner commented 3 years ago

Actually I just saw that there is the DerivedVariable class. I wonder if this would be the way to go? Net radiation (I guess for the CMIP case where this does not exist) is calculated there as down - up. So this could do the equivalent thing for up as down - net?

bouweandela commented 3 years ago

Question 1: Is it possible to make the preprocessor aware of new ERA5 variables like in this case?

Yes, I think so. There is this documentation available about using derived variables https://docs.esmvaltool.org/projects/esmvalcore/en/latest/recipe/preprocessor.html#variable-derivation and this for implementing derived variables: https://docs.esmvaltool.org/projects/esmvalcore/en/latest/develop/derivation.html An example recipe that uses all currently implemented derived variables is available here: https://github.com/ESMValGroup/ESMValTool/blob/master/esmvaltool/recipes/examples/recipe_preprocessor_derive_test.yml

Question 2: Is there a protocol to add new observations to ESMValTool and how does the code check (ESMValBot) work in this case? Is there need/way to upload the data as well?

Yes, you ask @remi-kazeroni (see also here). However, I don't think there is a need to store any new data anywhere in this case, because you can use the already available ERA5 data and derive the quantity you need from it, right?

how does the code check (ESMValBot) work in this case?

The ESMValBot has access to any data that is available at DKRZ

lukasbrunner commented 3 years ago

Sweet thanks a lot @bouweandela for these links!!

The variable derivation module allows to derive variables which are not in the CMIP standard data request using standard variables as input. The typical use case of this operation is the evaluation of a variable which is only available in an observational dataset but not in the models.

I have the inverse problem - rsus is available for CMIP BUT NOT for ERA5.

So my understanding is that it is the exact inverse from this here: https://github.com/ESMValGroup/ESMValCore/blob/e47a25f87b3e42184e67108d2b963ec0958deb40/esmvalcore/preprocessor/_derive/rsns.py

No CMOR file needed I guess since it should already exist in this case?

So this seems to work if I have the ERA5 surface_net_solar_radiation variable as rsns in the input dir:

class DerivedVariable(DerivedVariableBase):
    """Derivation of variable `rsus`."""

    @staticmethod
    def required(project):
        """Declare the variables needed for derivation."""
        required = [
            {
                'short_name': 'rsds'  # exists in ERA5 & CMIP
            },
            {
                'short_name': 'rsns'  # exists only in ERA5
            },
        ]
        return required

    @staticmethod
    def calculate(cubes):
        """Compute upwelling shortwave flux from downwelling and net."""
        rsds_cube = cubes.extract_strict(
            Constraint(name='surface_downwelling_shortwave_flux_in_air'))
        rsns_cube = cubes.extract_strict(
            Constraint(name='surface_net_downward_shortwave_flux'))

        rsus_cube = rsds_cube - rsns_cube

        return rsus_cube

But then it fails because it can not derive the unit (ERA5 is in Joule): Unable to convert from 'Unit('J m**-2')' to 'Unit('W m-2')'

So I assume we need a fix equivalent to https://github.com/ESMValGroup/ESMValCore/blob/e47a25f87b3e42184e67108d2b963ec0958deb40/esmvalcore/cmor/_fixes/native6/era5.py#L195-L204

Does that sound reasonable? Is it something you would consider adding to the core? (sorry again for all the questions, I've not been touching the core yet and I don't want to spend the time if this is somehow not something you want to do)

bouweandela commented 3 years ago

I'm not entirely sure what the best solution is here. The way I see it, there are two possible ways to implement this 1) as a derived variable, but you indicate that the derived variable is the inverse of what you would need and would be rather specific to ERA5 2) as a fix specific for ERA5. This seems the more logical choice, because the problem is specific to ERA5.

To implement it as a fix, you would just put the data for both variables rsds and rsns in a directory called rsus, so the tool finds them there and then inside the fix, e.g. as https://github.com/ESMValGroup/ESMValCore/blob/e47a25f87b3e42184e67108d2b963ec0958deb40/esmvalcore/cmor/_fixes/native6/era5.py#L195-L204 you would also add the code to do rsus = rsds - rsns. Could you try if that works?

lukasbrunner commented 3 years ago

Okay I organised the data as you suggested and added this to cmor/_fixes/native6/era5.py

class Rsus(Fix):
    """Fixes for Rsus."""
    def fix_metadata(self, cubes):
        """Fix metadata."""
        for cube in cubes:
            fix_hourly_time_coordinate(cube)
            fix_accumulated_units(cube)
            cube.attributes['positive'] = 'up'

        return cubes

    def fix_data(self, cubes):
        breakpoint()

It works for fixing the metadata but cubes in fix_data is always only one data cube, right? So I can't do the calculation at this point.

bouweandela commented 3 years ago

Good point, can you try doing rsus = rsds - rsns in the Rsus.fix_data method? There you should have all the cubes.

bouweandela commented 3 years ago

cubes in fix_data is always only one data cube, right?

It's all the cubes from a single input file, for ERA5 data that is indeed only one cube.

lukasbrunner commented 3 years ago

Is there another place where this could be done?

Otherwise I have one more idea but I'm not sure how much sense it makes. I could write a recipe like:

      rsus: &common_settings
        short_name: rsus
        additional_datasets: *model_data
      rsns: &common_settings
        short_name: rsns
        additional_datasets: *obs_data
      rsds: &common_settings
        short_name: rsds
        additional_datasets: *obs_data

And do the calculation of rsus for ERA5 in the diagnostic. We would still need to add rsns to the CMOR table since it is not a standard CMIP variable I guess, but that should be easy?

bouweandela commented 3 years ago

Good point, can you try doing rsus = rsds - rsns in the Rsus.fix_data method? There you should have all the cubes.

Did you see and try this https://github.com/ESMValGroup/ESMValTool/issues/2077#issuecomment-800067602?

lukasbrunner commented 3 years ago

Ah sorry maybe I formulated that wrong:

It works for fixing the metadata but cubes in fix_data is always only one data cube, right? So I can't do the calculation at this point.

I already tried it in fix_data when I wrote this. Here is the output of cubes at the breakpoint

ipdb> cubes
<iris 'Cube' of surface_upwelling_shortwave_flux_in_air / (W m-2) (time: 240; latitude: 721; longitude: 1440)>

That seems to be only one variable (240 time steps = 20 years that is what I have in my recipe)? Also at this point I don't know which variable it is as the metadata have been replaced to make it look like upwelling (even though its either downwelling or net).

bouweandela commented 3 years ago

Thanks for explaining. In that case I think the solution you propose here https://github.com/ESMValGroup/ESMValTool/issues/2077#issuecomment-800226289 (doing it in the diagnostic) may be the best option for now.

In the long run, I think we will need to make it possible to derive variables as part of the fixes or make the derived variables more efficient (ESMValGroup/ESMValCore#377) and more modular because we need that to support running models (e.g. ESMValGroup/ESMValCore#312).

We would still need to add rsns to the CMOR table since it is not a standard CMIP variable I guess, but that should be easy?

It's already there: https://github.com/ESMValGroup/ESMValCore/blob/master/esmvalcore/cmor/tables/custom/CMOR_rsns.dat

lukasbrunner commented 3 years ago

Ah great! Although now I'm a bit confused why is it positive: up but long_name: Surface Net downward Shortwave Radiation in that file? My assumption would be that

rsns = rsds - rsus

with rsds -> positve: down and rsus -> positive: up right? So should it not be rsns -> positive: down as well? In line with it being net downward. Or do I have a misconception here?

lukasbrunner commented 3 years ago

Just to clarify: I'm just asking if this is by choice or an oversight. ERA5 defines fluxes as positive: down

Mean surface net short-wave radiation flux This parameter is the amount of solar radiation (also known as shortwave radiation) that reaches a horizontal plane at the surface of the Earth (both direct and diffuse) minus the amount reflected by the Earth's surface (which is governed by the albedo). Radiation from the Sun (solar, or shortwave, radiation) is partly reflected back to space by clouds and particles in the atmosphere (aerosols) and some of it is absorbed. The remainder is incident on the Earth's surface, where some of it is reflected. This parameter is the mean over a particular time period which depends on the data extracted. The ECMWF convention for vertical fluxes is positive downwards.

https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-monthly-means?tab=overview

This means if the CMOR convention is positive: up I would need to add ad fix_data after all multiplying them by -1 I guess?

bouweandela commented 3 years ago

why is it positive: up but long_name: Surface Net downward Shortwave Radiation in that file?

@valeriupredoi Do you remember why this is? Is it correct? It looks like you added the file in c37db870c8509d8014f454f5109befe337833535

valeriupredoi commented 3 years ago

@bouweandela you are right - downward should be positive: down and upward should be positive: up - my mistake :beer: @lukasbrunner please change as appropriate :+1: