Closed ghost closed 1 year ago
This indicator would be interesting!
I think I would suggest implementing rainend
and rainstart
as there own indices. Something in the lines of rain_season_start
and rain_season_end
. Then a rain_season
could make calls to those?
I note that the existing xc.indices.run_length.season
is too limited for reuse in this case.
However, do I understand that there can be more than one rain season per period? This means that the output axis is not known in advance? That will no play well with the rest of xclim because it's not dask-ready; you can't have undefined dimension sizes for lazy computing. Ideas:
season
as a new axis).In scen_workflow_afr, it's possible to calculate rain season indices separately or at once (in a group of indices). So, it seems to do what you suggested.
The names could be changed to fit better with xclim naming style (ex: rain_season_start
rather than rainstart
).
There is not limit to the number of rain seasons, there were 2 in Ivory Coast, but only one in Burkina Faso, which is located immediately north. I heard that there are 1 to 3 in Benin, depending on the region. If there are 2 rain seasons, the last parameter of the 1st rainend
index is the name of the rainstart
index of the 2nd season. For instance, if I declare the groups of climate indices rainseason_1
and rainseason_2
in the configuration file, the xarray.DataArray
rainend_1
(comprised in the xarray.Dataset
rainseason_1
) will impose an upper limit according to the values in xarray.DataArray
rainstart_2
(comprised in the xarray.Dataset
rainseason_2
). This is essential to avoid an overlap between rain seasons. The implication is that the last rain season start of the year must be calculated before the first one (rainseason_2
before rainseason_1
). This formulation may seem unnecessarily complex, but the algorithm is supposed to cope with a season starting at the end of a year and ending at the beginning of the following year. Not all of this is automatic. The user needs to know about these limitations.
There could be more than one rain season per period, but the resulting xarray.Dataset
will only pertain to one season. This means that the user must know how many seasons there are in the studied region before setting up the parameters for each rain season group of indices in config.ini
. This seems compatible with dask.
In general, scen_workflow_afr allows to calculate the same index with diverse sets of parameters. For instance, tx_days_above_1
, tx_days_above_2
and tx_days_above_3
could calculate the indices for threshold temperatures of 30C, 35C and 40C. However, rain season indices are the only ones that can be connected (it's optional).
Oh! I see. I believe we will need to modify the approach a bit to fit this into xclim. There are 2 main approaches for xclim indicators currently: non-resampling and resampling. The former do not alter the time
dimension while the latter return an aggregation/reduction to a fixed frequency. The resampling is done by xarray and as such we are a bit limited : for each period the function must return a single value. As all this is in xclim, it's not a problem to write something that works differently, it just adds on the maintenance complexity and should be thought through with generality in mind.
So here what I was envisioning is a resampling function returning DataArrays that have an extra "season" dimension for each found dimensions. But then we get to the dask problem: the output structure is generated before the computation, so we need to know the number of seasons in advance.
Idea to overcome all this:
fire_season
(https://xclim.readthedocs.io/en/stable/indices.html?highlight=fire_season#xclim.indices.fire_season), i.e. the result is a non-resampled array of booleans, True on the rain seasons, False otherwise.n_seasons: Optional[int, str]
argument.
None
, the resulting season
dimension as the same length as the maximal number of seasons. The function fails if dask arrays are being used."longest"
, only the longest season is returned, there is no season
dimension on the output. This is also similar to what we do in Fire Weather Indexes, but it may not be applicable to this case. It just felt logical to have it.n_seasons
are returned, this is the length of the new season
axis. Years/points with a smaller number of seasons simply have NaNs appended. Option 2 and 3 are adaptable for dask-backed arrays.
Does this make sense?
@aulemahal The user can specify the number of seasons (that will be the same everywhere in the region), so this is not a problem. Regarding the other details, I'm not familiar with dask, so I would not know where to start and what needs to be adjusted in the current rainseason
functions. I looked quickly at the fire_season
index, but the detection rules seem simpler than for the rain season. Perhaps, we should have a meeting to discuss this further. Can I send you a Teams meeting invitation? Is there a day/time that would work better for you?
Yes we should talk about this de vive voix! My teams schedule is up to date, I have no preference.
The latest version of the algorithms calculating these indices can be found in the module indices.py
of the Github project scen_workflow_afr
(see functions rain_season
, rain_season_start
, rain_season_end
, rain_season_prcptot
, rain_season_length
), rather than in the associated pull request #894. The code in the current branch corresponds to an earlier version, which should not be used. A few improvements have been made in Fall 2021, in particular with the method that is based on a constant evapotranspiration rate.
The rain_season
calls funtions rain_season_start
, rain_season_end
, rain_season_prcptot
and rain_season_length
, in that specific order. This means that it's possible to calculate the 4 indices at once or independently. However, rain_season_prcptot
and rain_season_length
require the two other indices, but take little time to calculate, which means that there is usually no benefit in calculating an index separately (perhaps if only the rain_start
index is required). If the 4 indices are required, calculating them separately simplifies the number of arguments in each call, which is more user-friendly. If there was a way to reduce the number of arguments, that would certainly facilitate the use of these indices.
@yrouranos Thanks for the update! If you want to close #894 and re-open a new PR for a new branch with this set of indices, we can work on making things consistent with the rest of the indices.
@Zeitsperre I discussed that topic quickly with David this morning. Using the current issue and PR would allow to keep track of previous discussions. However, if you want to close the current PR and re-open a new PR (for consistency), I would recommend transferring the messages to the newly created PR, or referring to the initial PR (#894) from both the current issue and from the new PR. Pascal will integrate/improve these indices when he has time to do so. Perhaps, talk with him first to see which approach he prefers.
If you'd like to keep the current PR open (will be difficult to do as there may be lots of merge conflicts), you can do so as well. I only propose it since I think it would be much faster/cleaner to port your code changes onto a fresh branch of the trunk.
@yrouranos I couldn't find a GitHub project named as scen_workflow_afr
, could you give me a link to the repo?
Description
A group of climate indices related to rain season has been developed (currently available in /scen_workflow_afr/indices.py). The algorithm is currently being tested (using climate data from West Africa) and documented (link methods to scientific literature). These indices can be used to identify the impacts of climate change in countries where a monsoon is present. It could potentially be used with a snow
xarray.Dataset
.What I Did
A call to the developed function returns 4
xarray.DataArray
instances : 'rainstart' (beginning of season), 'rainend' (end of season), 'raindur' (duration of season) and 'rainqty' (amount of precipitation received during season). Parameters allow adjusting detection rules to different countries or regions. 3 methods were implemented for the end of the season. The detection rules are summarized below:rainstart : The season begins when the total amount of precipitation received (_Ptotal) over a period of _Dstart consecutive days is greater than a threshold value (_Pthreshold), and if there is no pause longer than _Ddry days within a period of _Dwet days since the beginning of the rain season. Optionally, the user can specify the search range for the first day of the season (between day _DOYmin and _DOYmax).
rainend ('depletion' method): The season ends after a water column of height _Pthreshold has completed evaporated, considering that evapotranspiration (a constant daily evaporation rate of _ETPthreshold or a
xarray.DataArray
of projected values can be provided) and precipitation both affect the height of the water column.rainend ('event' method) : The season ends on the last day of a period of _Dthreshold consecutive days during which the daily precipitation amount (_Pday) is lower than a threshold value (_Pthreshold).
rainend ('total' method) : The season ends on the last day of a period of _Dthreshold consecutive days during which the total amount of precipitation received (_Ptotal) is lower than a threshold value (_Pthreshold).
rainend (any method) : Optionally, the user can specify the search range for the last day of the season (between day _DOYmin and _DOYmax). If there are multiple rain seasons in a year (which means several 'rainseason' groups of indices), it is possible to impose an upper limit, which is based on the beginning of the following rain season. This allows to avoid an overlap between two seasons.
raindur: It is the difference between 'rainend' and 'rainstart', plus one.
rainqty: It is the total amount of precipitation received between 'rainstart' and 'rainend'.
It is possible that no value is found locally (a few cells) in the resulting
xarray.DataArrays
. That's not necessarily a problem, since interpolation can be used to replace 'nan' values (strategy used in scen_workflow_afr) or simply ignored when calculating statistics. Too many missing values could mean that the detection rules are not compatible with the climate in the studied region.What I Received
This section is not required.