Proposal of a rain season group of climate indices

Ouranosinc / xclim

Library of derived climate variables, ie climate indicators, based on xarray.

https://xclim.readthedocs.io/en/stable/

Apache License 2.0

331 stars 59 forks source link

Proposal of a rain season group of climate indices #842

Closed ghost closed 1 year ago

ghost commented 3 years ago

xclim version: 0.25.0
Python version: 3.7.6
Operating System: LinuxMint 20.1

Description

A group of climate indices related to rain season has been developed (currently available in /scen_workflow_afr/indices.py). The algorithm is currently being tested (using climate data from West Africa) and documented (link methods to scientific literature). These indices can be used to identify the impacts of climate change in countries where a monsoon is present. It could potentially be used with a snow xarray.Dataset.

What I Did

A call to the developed function returns 4 xarray.DataArray instances : 'rainstart' (beginning of season), 'rainend' (end of season), 'raindur' (duration of season) and 'rainqty' (amount of precipitation received during season). Parameters allow adjusting detection rules to different countries or regions. 3 methods were implemented for the end of the season. The detection rules are summarized below:

rainstart : The season begins when the total amount of precipitation received (_Ptotal) over a period of _Dstart consecutive days is greater than a threshold value (_Pthreshold), and if there is no pause longer than _Ddry days within a period of _Dwet days since the beginning of the rain season. Optionally, the user can specify the search range for the first day of the season (between day _DOYmin and _DOYmax).

rainend ('depletion' method): The season ends after a water column of height _Pthreshold has completed evaporated, considering that evapotranspiration (a constant daily evaporation rate of _ETPthreshold or a xarray.DataArray of projected values can be provided) and precipitation both affect the height of the water column.

rainend ('event' method) : The season ends on the last day of a period of _Dthreshold consecutive days during which the daily precipitation amount (_Pday) is lower than a threshold value (_Pthreshold).

rainend ('total' method) : The season ends on the last day of a period of _Dthreshold consecutive days during which the total amount of precipitation received (_Ptotal) is lower than a threshold value (_Pthreshold).

rainend (any method) : Optionally, the user can specify the search range for the last day of the season (between day _DOYmin and _DOYmax). If there are multiple rain seasons in a year (which means several 'rainseason' groups of indices), it is possible to impose an upper limit, which is based on the beginning of the following rain season. This allows to avoid an overlap between two seasons.

raindur: It is the difference between 'rainend' and 'rainstart', plus one.

rainqty: It is the total amount of precipitation received between 'rainstart' and 'rainend'.

It is possible that no value is found locally (a few cells) in the resulting xarray.DataArrays. That's not necessarily a problem, since interpolation can be used to replace 'nan' values (strategy used in scen_workflow_afr) or simply ignored when calculating statistics. Too many missing values could mean that the detection rules are not compatible with the climate in the studied region.

What I Received

This section is not required.

aulemahal commented 3 years ago

This indicator would be interesting!

I think I would suggest implementing rainend and rainstart as there own indices. Something in the lines of rain_season_start and rain_season_end. Then a rain_season could make calls to those?

I note that the existing xc.indices.run_length.season is too limited for reuse in this case.

However, do I understand that there can be more than one rain season per period? This means that the output axis is not known in advance? That will no play well with the rest of xclim because it's not dask-ready; you can't have undefined dimension sizes for lazy computing. Ideas:

Return the first season only.
Return a fixed number of seasons (with season as a new axis).

ghost commented 3 years ago

In scen_workflow_afr, it's possible to calculate rain season indices separately or at once (in a group of indices). So, it seems to do what you suggested.

The names could be changed to fit better with xclim naming style (ex: rain_season_start rather than rainstart).

There is not limit to the number of rain seasons, there were 2 in Ivory Coast, but only one in Burkina Faso, which is located immediately north. I heard that there are 1 to 3 in Benin, depending on the region. If there are 2 rain seasons, the last parameter of the 1st rainend index is the name of the rainstart index of the 2nd season. For instance, if I declare the groups of climate indices rainseason_1 and rainseason_2 in the configuration file, the xarray.DataArray rainend_1 (comprised in the xarray.Dataset rainseason_1) will impose an upper limit according to the values in xarray.DataArray rainstart_2 (comprised in the xarray.Dataset rainseason_2). This is essential to avoid an overlap between rain seasons. The implication is that the last rain season start of the year must be calculated before the first one (rainseason_2 before rainseason_1). This formulation may seem unnecessarily complex, but the algorithm is supposed to cope with a season starting at the end of a year and ending at the beginning of the following year. Not all of this is automatic. The user needs to know about these limitations.

There could be more than one rain season per period, but the resulting xarray.Dataset will only pertain to one season. This means that the user must know how many seasons there are in the studied region before setting up the parameters for each rain season group of indices in config.ini. This seems compatible with dask.

In general, scen_workflow_afr allows to calculate the same index with diverse sets of parameters. For instance, tx_days_above_1, tx_days_above_2 and tx_days_above_3 could calculate the indices for threshold temperatures of 30C, 35C and 40C. However, rain season indices are the only ones that can be connected (it's optional).

aulemahal commented 3 years ago

Oh! I see. I believe we will need to modify the approach a bit to fit this into xclim. There are 2 main approaches for xclim indicators currently: non-resampling and resampling. The former do not alter the time dimension while the latter return an aggregation/reduction to a fixed frequency. The resampling is done by xarray and as such we are a bit limited : for each period the function must return a single value. As all this is in xclim, it's not a problem to write something that works differently, it just adds on the maintenance complexity and should be thought through with generality in mind.

So here what I was envisioning is a resampling function returning DataArrays that have an extra "season" dimension for each found dimensions. But then we get to the dask problem: the output structure is generated before the computation, so we need to know the number of seasons in advance.

Idea to overcome all this:

The season determination could be similar to fire_season (https://xclim.readthedocs.io/en/stable/indices.html?highlight=fire_season#xclim.indices.fire_season), i.e. the result is a non-resampled array of booleans, True on the rain seasons, False otherwise.
The season statistics function would be resampling and accept a n_seasons: Optional[int, str] argument.
- If None, the resulting season dimension as the same length as the maximal number of seasons. The function fails if dask arrays are being used.
- If "longest", only the longest season is returned, there is no season dimension on the output. This is also similar to what we do in Fire Weather Indexes, but it may not be applicable to this case. It just felt logical to have it.
- if an integer, only the first n_seasons are returned, this is the length of the new season axis. Years/points with a smaller number of seasons simply have NaNs appended.

Option 2 and 3 are adaptable for dask-backed arrays.

Does this make sense?

ghost commented 3 years ago

@aulemahal The user can specify the number of seasons (that will be the same everywhere in the region), so this is not a problem. Regarding the other details, I'm not familiar with dask, so I would not know where to start and what needs to be adjusted in the current rainseason functions. I looked quickly at the fire_season index, but the detection rules seem simpler than for the rain season. Perhaps, we should have a meeting to discuss this further. Can I send you a Teams meeting invitation? Is there a day/time that would work better for you?

aulemahal commented 3 years ago

Yes we should talk about this de vive voix! My teams schedule is up to date, I have no preference.

ghost commented 2 years ago

The latest version of the algorithms calculating these indices can be found in the module indices.py of the Github project scen_workflow_afr (see functions rain_season, rain_season_start, rain_season_end, rain_season_prcptot, rain_season_length), rather than in the associated pull request #894. The code in the current branch corresponds to an earlier version, which should not be used. A few improvements have been made in Fall 2021, in particular with the method that is based on a constant evapotranspiration rate.

The rain_season calls funtions rain_season_start, rain_season_end, rain_season_prcptot and rain_season_length, in that specific order. This means that it's possible to calculate the 4 indices at once or independently. However, rain_season_prcptot and rain_season_length require the two other indices, but take little time to calculate, which means that there is usually no benefit in calculating an index separately (perhaps if only the rain_start index is required). If the 4 indices are required, calculating them separately simplifies the number of arguments in each call, which is more user-friendly. If there was a way to reduce the number of arguments, that would certainly facilitate the use of these indices.

Zeitsperre commented 2 years ago

@yrouranos Thanks for the update! If you want to close #894 and re-open a new PR for a new branch with this set of indices, we can work on making things consistent with the rest of the indices.

ghost commented 2 years ago

@Zeitsperre I discussed that topic quickly with David this morning. Using the current issue and PR would allow to keep track of previous discussions. However, if you want to close the current PR and re-open a new PR (for consistency), I would recommend transferring the messages to the newly created PR, or referring to the initial PR (#894) from both the current issue and from the new PR. Pascal will integrate/improve these indices when he has time to do so. Perhaps, talk with him first to see which approach he prefers.

Zeitsperre commented 2 years ago

If you'd like to keep the current PR open (will be difficult to do as there may be lots of merge conflicts), you can do so as well. I only propose it since I think it would be much faster/cleaner to port your code changes onto a fresh branch of the trunk.

coxipi commented 2 years ago

@yrouranos I couldn't find a GitHub project named as scen_workflow_afr, could you give me a link to the repo?