Open Thomas-Moore-Creative opened 3 years ago
Next, go away and explore if BOM colleagues have documented code for the generation of ensemble means from ACCESS-S2 lagged ensembles.
two functions, one to get a climatology (get_custom_clim_s2_from_ens), one to get one hindcast startdate data(get_emn_from_ens_s2). Then I subtract whats returned for the climatology from what is returned for the data to get the anomaly.
data_loc='/g/data/ux62/access-s2/hindcast/raw_model/ocean/hc300/monthly/'
m,start_day is the month and day of interest
start_year,end_year is your hindcast period. For all of S2 use 1981 and 2018
t is timestep
limits=if global use [0, 1442, 0, 1021]
first_letter='m'
fn_var and nc_var is 'hc300'
num_ens=3
tl_days=9
def get_custom_clim_s2_from_ens(data_loc,m,start_year,end_year,start_day,t,limits,first_letter,fn_var,nc_var,num_ens,tl_days):
print('clim',m,start_day,t)
#Hindcast period and ensemble number hard coded
tls=[]
for tl in range(0,tl_days):
yrs=[]
for y in range(start_year,end_year+1):
es=[]
for e in range(1,num_ens+1):
date_stamp=datetime.date(y,m,start_day)
if y == 1981 and m == 1:
date_stamp=date_stamp
else:
date_stamp=date_stamp-datetime.timedelta(days=tl)
file=data_loc+'/e'+str(e).zfill(2)+'/'+first_letter+'o_'+fn_var+'_'+date_stamp.strftime("%Y%m%d")+'_e'+str(e).zfill(2)+'.nc'
print(file)
try:
cdf=Dataset(file,'r')
data=cdf.variables[nc_var][t,0,limits[2]:limits[3]+1,limits[0]:limits[1]+1]
cdf.close()
es.append(data)
except ValueError:
cdf=Dataset(file,'r')
data=cdf.variables[nc_var][t,limits[2]:limits[3]+1,limits[0]:limits[1]+1]
cdf.close()
es.append(data)
es=np.nanmean(es,axis=0)
yrs.append(es)
tls.append(np.nanmean(yrs,axis=0))
print "tls=",np.shape(tls)
return np.array(tls)
data_loc='/g/data/ux62/access-s2/hindcast/raw_model/ocean/hc300/monthly/'
y,m,start_day,t are the start date and time step of interest
limits=if global use [0, 1442, 0, 1021]
first_letter='m'
fn_var and nc_var is 'hc300'
num_ens=3
tl_days=9
def get_emn_from_ens_s2(data_loc,y,m,start_day,t,limits,first_letter,fn_var,nc_var,num_ens,tl_days):
#Hindcast period and ensemble number hard coded
tls=[]
for tl in range(0,tl_days):
ens=[]
for e in range(1,num_ens+1):
date_stamp=datetime.date(y,m,start_day)
date_stamp=date_stamp-datetime.timedelta(days=tl)
file=data_loc+'/e'+str(e).zfill(2)+'/'+first_letter+'o_'+fn_var+'_'+date_stamp.strftime("%Y%m%d")+'_e'+str(e).zfill(2)+'.nc'
print(file)
cdf=Dataset(file, 'r')
data=cdf.variables[nc_var][t,0,limits[2]:limits[3]+1,limits[0]:limits[1]+1]
cdf.close()
ens.append(data)
tls.append(np.nanmean(ens,axis=0))
mask=np.ma.getmask(data)
return tls,mask
There are a few possible approaches - all will potentially change how an "ensemble mean" for S2 might be interpreted under different skill assessments
Example directory structure from:
/g/data/ux62/access-s2/hindcast/raw_model/ocean/hc300/monthly/e01
File naming:
mo_variable_YYYYMMDD_ensemble.nc
Consider coding up the following for comparison?
Simple day 1
= Average just the three ensembles frommo_variable_YYYYMM01
into anemean
productSimple day 16
= Average just the three ensembles frommo_variable_YYYYMM16
into anemean
productFull day 1
= Average across all three ensembles andmo_variable_YYYYMM21/22/23/24-01
taking into account shifting the lead times for files with dates21/22/23/24 - 28/29/30/31
Weighted full day 1
= same asFull day 1
but with weights applied - ??? what weights and why ???