Rarell / Thesis_Research

The materials used for conducting the Master's thesis research for Stuart Edris at the University of Oklahoma.
0 stars 0 forks source link

Help on Flash Drought Frequency Statistics #1

Open YANG11SD opened 1 year ago

YANG11SD commented 1 year ago

Good afternoon Professor, I have now identified flash droughts, storing them as a piece of .npz data that identifies the flash drought region as 1. But I am having trouble with the identification of flash drought frequencies in multiple dimensions. How do I come up with a graph on flash drought frequencies. I have read the article "Application of an improved spatio-temporal identification method of flash droughts", which reads as follows: “The marked binary grids are then classified into drought patches by means of the Contiguous Drought Area (CDA) analysis (Corzo Perez et al., 2011; Diaz et al., 2020). In this approach, a two-scan algorithm is applied. The first run is performed to connect the given grid with its 8 neighbors and assign a provisional label to the contiguous grids.” Can you give me some advice? Extremely grateful!

Rarell commented 1 year ago

My Greetings,

Basically to create a map, you perform some calculation to collapse the time dimension. For the flash drought (FD) frequency, I generally remove points that identify multiple FD (for the figure, I only want to count 1 FD in each year), then I sum over the time axis to get the total number of years that experienced FD (you can also divide by the total number of years in the dataset to get the percentage of years in which FD was found). In terms of calculating FD for multiple dimensions, I have a function (attached at the bottom, it is christian_fd(), most of the remaining functions are support functions it uses) that will take in a multidimensional array of SESR and identify FD for each pentad and grid point in that array. I also added a function I've been using for making the frequency maps, so you can see how that works (the function is display_fd_climatology()).

I will also add a note here, this code is quite old (I had even forgotten this repository is here). This project was eventually shelved, as the GFS was not able to represent surface evaporation well. The project that took over from this was moved to a new repository that better reflects the focus of that project (decomposition of FD components), and is located here: FD_Component_Decomposition. The code I've attached here is from a newer project (not yet published) and designed to streamline the FD identification process more and to make it a little easier to use.

Lastly on using drought patches, drought patches is another approach for FD identification, especially useful for looking at specific events or ensuring the FD spans a large enough area. However I have not actually used them myself, and do not know how to implement drought patches in the code (it is a problem I've run into before).

My hopes this is all of use.

Sincerely, Stuart Edris

`import numpy as np from scipy import stats from scipy import interpolate from scipy import signal from datetime import datetime, timedelta import matplotlib.pyplot as plt import matplotlib.colors as mcolors from matplotlib import colorbar as mcolorbar import cartopy.crs as ccrs import caropy.feature as cfeature import cartopy.mpl.ticker as cticker import cartopy.io.shapereader as shpreader

%%

##############################################

Create a function to collect climatological data

def collect_climatology(X, dates, start_year, end_year): ''' Extract data between the beginning and ending years.

Inputs:
:param X: Input 3D data, in time x lat x lon format
:param dates: Array of datetimes corresponding to the time axis of X
:param start_year: The beginning year in the interval being searched
:param end_year: The ending year in the interval being searched

Outputs:
:param X_climo: X between the specified start and end dates
'''

# Turn the start and end years into datetimes
begin_date = datetime(start_year, 1, 1)
end_date   = datetime(end_year, 12, 31)

# Determine all the points between the start and end points
ind = np.where( (dates >= begin_date) & (dates <= end_date) )[0]

if len(X.shape) < 3:
    X_climo = X[ind]
else:
    X_climo = X[ind,:,:]

return X_climo

%%

##############################################

Calculate the climatological means and standard deviations

def calculate_climatology(X, pentad = True): ''' Calculates the climatological mean and standard deviation of gridded data. Climatological data is calculated for all grid points and for all timestamps in the year.

Inputs:
:param X: 3D variable whose mean and standard deviation will be calculated.
:param pentad: Boolean value giving if the time scale of X is 
               pentad (5 day average) or daily.

Outputs:
:param clim_mean: Calculated mean of X for each day/pentad and grid point. 
                  clim_mean as the same spatial dimensions as X and 365 (73)
                  temporal dimension for daily (pentad) data.
:param clim_std: Calculated standard deviation for each day/pentad and grid point.
                 clim_std as the same spatial dimensions as X and 365 (73)
                 temporal dimension for daily (pentad) data.
'''

# Obtain the dimensions of the variable
if len(X.shape) < 3:
    T = X.size
else:
    T, I, J = X.shape

# Count the number of years
if pentad is True:
    year_len = int(365/5)
else:
    year_len = int(365)

N_years = int(np.ceil(T/year_len))

# Create a variable for each day, assumed starting at Jan 1 and no
#   leap years (i.e., each year is only 365 days each)
day = np.ones((T)) * np.nan

n = 0
for i in range(1, N_years+1):
    if i >= N_years:
        day[n:T+1] = np.arange(1, len(day[n:T+1])+1)
    else:
        day[n:n+year_len] = np.arange(1, year_len+1)

    n = n + year_len

# Initialize the climatological mean and standard deviation variables
if len(X.shape) < 3:
    clim_mean = np.ones((year_len)) * np.nan
    clim_std  = np.ones((year_len)) * np.nan
else:
    clim_mean = np.ones((year_len, I, J)) * np.nan
    clim_std  = np.ones((year_len, I, J)) * np.nan

# Calculate the mean and standard deviation for each day and at each grid
#   point
for i in range(1, year_len+1):
    ind = np.where(i == day)[0]

    if len(X.shape) < 3:
        clim_mean[i-1] = np.nanmean(X[ind], axis = 0)
        clim_std[i-1]  = np.nanstd(X[ind], axis = 0)
    else:
        clim_mean[i-1,:,:] = np.nanmean(X[ind,:,:], axis = 0)
        clim_std[i-1,:,:]  = np.nanstd(X[ind,:,:], axis = 0)

return clim_mean, clim_std

%%

##############################################

Function to remove sea data points

def apply_mask(data, mask): ''' Turn sea points into NaNs based on a land-sea mask where 0 is sea and 1 is land

Inputs:
:param data: Data to be masked
:param mask: Land-sea mask. Must have the same spatial dimensions as data

Outputs:
:param data_mask: Data with all labeled sea grids at NaN
'''

T, I, J = data.shape

data_mask = np.ones((T, I, J)) * np.nan
for t in range(T):
    data_mask[t,:,:] = np.where(mask == 1, data[t,:,:], np.nan)

return data_mask

%%

##############################################

Function to calculate SESR

Details in SESR can be found in the Christian et al. 2019 paper.

def calculate_sesr(et, pet, dates, mask, start_year = 1990, end_year = 2020, years = None, months = None, days = None): ''' Calculate the standardized evaporative stress ratio (SESR) from ET and PET.

Full details on SESR can be found in Christian et al. 2019 (for SESR): https://doi.org/10.1175/JHM-D-18-0198.1.

Inputs:
:param et: Input evapotranspiration (ET) data
:param pet: Input potential evapotranspiration (PET) data. Should be in the same units as et
:param dates: Array of datetimes corresponding to the timestamps in et and pet
:param mask: Land-sea mask for the et and pet variables
:param start_year: The start year in the climatological period used
:param end_year: The last year in the climatological period used
:param years: Array of intergers corresponding to the dates.year. If None, it is made from dates
:param months: Array of intergers corresponding to the dates.month. If None, it is made from dates
:param days: Array of intergers corresponding to the dates.day. If None, it is made from dates

Outputs:
:param sesr: Calculate SESR, has the same shape and size as et/pet

'''

# Make the years, months, and/or days variables?
if years == None:
    years = np.array([date.year for date in dates])

if months == None:
    months = np.array([date.month for date in dates])

if days == None:
    days = np.array([date.day for date in dates])

# Obtain the evaporative stress ratio (ESR); the ratio of ET to PET
esr = et/pet

# Remove values that exceed a certain limit as they are likely an error
esr[esr < 0] = np.nan
esr[esr > 3] = np.nan

# Collect the climatological data for the ESR
esr_climo = collect_climatology(esr, dates, start_year = start_year, end_year = end_year)

# Determine the climatological mean and standard deviations of ESR
esr_mean, esr_std = calculate_climatology(esr_climo, pentad = True)

# Find the time stamps for a singular year
ind = np.where(years == 1999)[0] # Note, any non-leap year will do
one_year = dates[ind]

# Calculate SESR; it is the standardized ESR
T, I, J = esr.shape

sesr = np.ones((T, I, J)) * np.nan

for n, date in enumerate(one_year):
    ind = np.where( (date.month == months) & (date.day == days) )[0]

    for t in ind:
        sesr[t,:,:] = (esr[t,:,:] - esr_mean[n,:,:])/esr_std[n,:,:]

# Remove any unrealistic points
sesr = np.where(sesr < -5, -5, sesr)
sesr = np.where(sesr > 5, 5, sesr)

# Remove any sea data points
# This can be commented out if there is no land-sea mask
sesr = apply_mask(sesr, mask)

return sesr

Create a function to calcualte flash droughts using an improved version of the FD identification method from Christian et al. 2019

This method uses SESR to identify FD

def christian_fd(sesr, mask, dates, years = None, months = None, days = None): ''' Calculate the flash drought using an updated version of the method described in Christian et al. 2019 (https://doi.org/10.1175/JHM-D-18-0198.1). This method uses the evaporative stress ratio (SESR) to identify flash drought. Updates to the method are details (for LSWI) in Christian et al. 2022 (https://doi.org/10.1016%2Fj.rsase.2022.100770).

Inputs:
:param sesr: Input SESR values, time x lat x lon format
:param mask: Land-sea mask for the et and pet variables
:param dates: Array of datetimes corresponding to the timestamps in et and pet
:param mask: Land-sea mask for the et and pet variables. A a value of none can be provided to not use a mask
:param years: Array of intergers corresponding to the dates.year. If None, it is made from dates
:param months: Array of intergers corresponding to the dates.month. If None, it is made from dates
:param days: Array of intergers corresponding to the dates.day. If None, it is made from dates

Outputs:
:param fd: The identified flash drought for all grid points and time steps in sesr
'''

# Make the years, months, and/or days variables?
if years == None:
    years = np.array([date.year for date in dates])

if months == None:
    months = np.array([date.month for date in dates])

if days == None:
    days = np.array([date.day for date in dates])

# Is a mask provided?
mask_provided = mask is not None

# Initialize some variables
T, I, J = sesr.shape
sesr_inter = np.ones((T, I, J)) * np.nan
sesr_filt  = np.ones((T, I, J)) * np.nan

sesr2d = sesr.reshape(T, I*J, order = 'F')
sesr_inter2d = sesr_inter.reshape(T, I*J, order = 'F')
sesr_filt2d  = sesr_filt.reshape(T, I*J, order = 'F')

if mask_provided:
    mask2d = mask.reshape(I*J, order = 'F')

x = np.arange(-6.5, 6.5, (13/T))[:-1] # a variable covering the range of all SESR values with 1 entry for each time step
print(x.size, T)

# Parameters for the filter
WinLength = 21 # Window length of 21 pentads
PolyOrder = 4

# Perform a basic linear interpolation for NaN values and apply a SG filter
print('Applying interpolation and Savitzky-Golay filter to SESR')
for ij in range(I*J):
    if mask_provided:
        if mask2d[ij] == 0:
            continue
        else:
            pass

    # Perform a linear interpolation to remove NaNs
    ind = np.isfinite(sesr2d[:,ij])
    if np.nansum(ind) == 0:
        continue
    else:
        pass

    ind = np.where(ind == True)[0]
    interp_func = interpolate.interp1d(x[ind], sesr2d[ind,ij], kind = 'linear', fill_value = 'extrapolate')

    sesr_inter2d[:,ij] = interp_func(x)

    # Apply the Savitzky-Golay filter to the interpolated SESR data
    sesr_filt2d[:,ij] = signal.savgol_filter(sesr_inter2d[:,ij], WinLength, PolyOrder)

# Reorder SESR back to 3D data
sesr_filt = sesr_filt2d.reshape(T, I, J, order = 'F')

# Determine the change in SESR
print('Calculating the change in SESR')
delta_sesr  = np.ones((T, I, J)) * np.nan

delta_sesr[1:,:,:] = sesr_filt[1:,:,:] - sesr_filt[:-1,:,:]

# Begin the flash drought calculations
print('Identifying flash drought')
fd = np.ones((T, I, J)) * np.nan

fd2d = fd.reshape(T, I*J, order = 'F')
dsesr2d = delta_sesr.reshape(T, I*J, order = 'F')

dsesr_percentile = 25
sesr_percentile  = 20

min_change = timedelta(days = 30)
start_date = dates[-1]

for ij in range(I*J):
    if mask_provided:
        if mask2d[ij] == 0:
            continue
        else:
            pass

    start_date = dates[-1]
    for t in range(T):
        ind = np.where( (dates[t].month == months) & (dates[t].day == days) )[0]

        # Determine the percentiles of dSESR and SESR
        ri_crit = np.nanpercentile(dsesr2d[ind,ij], dsesr_percentile)
        dc_crit = np.nanpercentile(sesr_filt2d[ind,ij], sesr_percentile)

        # If start_date != dates[-1], the rapid intensification criteria is satisified
        # If the rapid intensification and drought component criteria are satisified (and FD period is 30+ days)
        # then FD occurs
        if ( (dates[t] - start_date) >= min_change) & (sesr_filt2d[t,ij] <= dc_crit):
            fd2d[t,ij] = 1
        else:
            fd2d[t,ij] = 0

        # # If the change in SESR is below the criteria, change the start date of the flash drought
        if (dsesr2d[t,ij] <= ri_crit) & (start_date == dates[-1]):
            start_date = dates[t]
        elif (dsesr2d[t,ij] <= ri_crit) & (start_date != dates[-1]):
            pass
        else:
            start_date = dates[-1]

# Re-order the flash drought back into a 3D array
fd = fd2d.reshape(T, I, J, order = 'F')
print('Done')

return fd

def display_fd_climatology(fd, lat, lon, dates, method, model = 'narr', path = './Figures', grow_season = False, years = None, months = None): ''' Display the climatology of flash drought (percentage of years with flash drought)

Inputs:
:param fd: Input flash drought (FD) data to be plotted, time x lat x lon format
:param lat: Gridded latitude values corresponding to data
:param lon: Gridded longitude values corresponding to data
:param dates: Array of datetimes corresponding to the timestamps in fd
:param method: String describing the method used to calculate the flash drought
:param model: String describing what reanalysis model the data comes from. Used to name the figure
:param path: Path the figure will be saved to
:param grow_season: Boolean indicating whether fd has already been set into growing seasons.
:param years: Array of intergers corresponding to the dates.year. If None, it is made from dates
:param months: Array of intergers corresponding to the dates.month. If None, it is made from dates
'''

# Make the years, months, and/or days variables?
if years == None:
    years = np.array([date.year for date in dates])

if months == None:
    months = np.array([date.month for date in dates])

### Calcualte the climatology ###

# Initialize variables
T, I, J = fd.shape
all_years = np.unique(years)

ann_fd = np.ones((all_years.size, I, J)) * np.nan

# Calculate the average number of rapid intensifications and flash droughts in a year
for y in range(all_years.size):
    if grow_season:
        y_ind = np.where( (all_years[y] == years) )[0]
    else:
        y_ind = np.where( (all_years[y] == years) & ((months >= 4) & (months <= 10)) )[0] # Second set of conditions ensures only growing season values

    # Calculate the mean number of flash drought for each year    
    ann_fd[y,:,:] = np.nanmean(fd[y_ind,:,:], axis = 0)

    # Turn nonzero values to 1 (each year gets 1 count to the total)    
    ann_fd[y,:,:] = np.where(( (ann_fd[y,:,:] == 0) | (np.isnan(ann_fd[y,:,:])) ),
                             ann_fd[y,:,:], 1) # This changes nonzero  and nan (sea) values to 1.

# Calculate the percentage number of years with rapid intensifications and flash droughts
per_ann_fd = np.nansum(ann_fd[:,:,:], axis = 0)/all_years.size

# Turn 0 values into nan
per_ann_fd = np.where(per_ann_fd != 0, per_ann_fd, np.nan)

#### Create the Plot ####

# Set colorbar information
cmin = -20; cmax = 80; cint = 1
clevs = np.arange(-20, cmax + cint, cint)
nlevs = len(clevs)
cmap  = plt.get_cmap(name = 'hot_r', lut = nlevs)

# Get the normalized color values
norm = mcolors.Normalize(vmin = 0, vmax = cmax)

# Generate the colors from the orginal color map in range from [0, cmax]
colors = cmap(np.linspace(1 - (cmax - 0)/(cmax - cmin), 1, cmap.N))  ### Note, in the event cmin and cmax share the same sign, 1 - (cmax - cmin)/cmax should be used
colors[:4,:] = np.array([1., 1., 1., 1.]) # Change the value of 0 to white

# Create a new colorbar cut from the colors in range [0, cmax.]
ColorMap = mcolors.LinearSegmentedColormap.from_list('cut_hot_r', colors)

colorsNew = cmap(np.linspace(0, 1, cmap.N))
colorsNew[abs(cmin)-1:abs(cmin)+1, :] = np.array([1., 1., 1., 1.]) # Change the value of 0 in the plotted colormap to white
cmap = mcolors.LinearSegmentedColormap.from_list('hot_r', colorsNew)

# Shapefile information
# ShapeName = 'Admin_1_states_provinces_lakes_shp'
ShapeName = 'admin_0_countries'
CountriesSHP = shpreader.natural_earth(resolution = '110m', category = 'cultural', name = ShapeName)

CountriesReader = shpreader.Reader(CountriesSHP)

USGeom = [country.geometry for country in CountriesReader.records() if country.attributes['NAME'] == 'United States of America']
NonUSGeom = [country.geometry for country in CountriesReader.records() if country.attributes['NAME'] != 'United States of America']

# Lonitude and latitude tick information
lat_int = 10
lon_int = 20

LatLabel = np.arange(-90, 90, lat_int)
LonLabel = np.arange(-180, 180, lon_int)

LonFormatter = cticker.LongitudeFormatter()
LatFormatter = cticker.LatitudeFormatter()

# Projection information
data_proj = ccrs.PlateCarree()
fig_proj  = ccrs.PlateCarree()

# Create the plots
fig = plt.figure(figsize = [12, 10])

# Flash Drought plot
ax = fig.add_subplot(1, 1, 1, projection = fig_proj)

# Set the flash drought title
ax.set_title('Percent of Years from %s - %s with %s Flash Drought'%(all_years[0], all_years[-1], method), size = 18)

# Ocean and non-U.S. countries covers and "masks" data outside the U.S.
ax.add_feature(cfeature.OCEAN, facecolor = 'white', edgecolor = 'white', zorder = 2)
ax.add_feature(cfeature.STATES)
ax.add_geometries(USGeom, crs = fig_proj, facecolor = 'none', edgecolor = 'black', zorder = 3)
ax.add_geometries(NonUSGeom, crs = fig_proj, facecolor = 'white', edgecolor = 'white', zorder = 2)

# Adjust the ticks
ax.set_xticks(LonLabel, crs = ccrs.PlateCarree())
ax.set_yticks(LatLabel, crs = ccrs.PlateCarree())

ax.set_yticklabels(LatLabel, fontsize = 18)
ax.set_xticklabels(LonLabel, fontsize = 18)

ax.xaxis.set_major_formatter(LonFormatter)
ax.yaxis.set_major_formatter(LatFormatter)

# Plot the flash drought data
cs = ax.pcolormesh(lon, lat, per_ann_fd*100, vmin = cmin, vmax = cmax,
                   cmap = cmap, transform = data_proj, zorder = 1)

# Set the map extent to the U.S.
ax.set_extent([-130, -65, 23.5, 48.5])

# Set the colorbar size and location
cbax = fig.add_axes([0.915, 0.29, 0.025, 0.425])

# Create the colorbar
cbar = mcolorbar.ColorbarBase(cbax, cmap = ColorMap, norm = norm, orientation = 'vertical')

# Set the colorbar label
cbar.ax.set_ylabel('% of years with Flash Drought', fontsize = 18)

# Set the colorbar ticks
cbar.set_ticks(np.arange(0, 90, 10))
cbar.ax.set_yticklabels(np.arange(0, 90, 10), fontsize = 16)

# Show the figure
plt.show(block = False)

# Save the figure
filename = '%s_%s_flash_drought_climatology.png'%(model, method)
plt.savefig('%s/%s'%(path, filename), bbox_inches = 'tight')
plt.show(block = False)`
YANG11SD commented 1 year ago

Dear Prof. Edris, Thank you very much for your reply. If I have any more questions after that, I will continue to ask you for advice, thank you for your full help. Finally, I wish you good health and academic success. Sincerely, yours YANG