Closed anoukvlug closed 7 years ago
Hi Anouk,
This is getting quite interesting, I'd like to know more about your plans ;-) But first, to your question:
I expect that solving this problem won't be trivial, as there is an underlying problem in pandas and xarray (see e.g. https://github.com/pydata/xarray/issues/789 for xarray, which relies on pandas: https://github.com/pandas-dev/pandas/issues/7307 ).
OGGM does not rely heavily on pandas datetimes so I think we could solve this problem without getting burned in a hell of changes, but on the top of my head I can't really make a prognostic about how much change this would imply.
In fact, the dynamical model of OGGM already relies on a very simple time system (https://github.com/OGGM/oggm/blob/master/oggm/utils.py#L675-L718), which could be meliorated and extended (based on other peoples systems, as I hope we don't have to reinvent the wheel here).
I know this is not very helpful, but yes a change that would make OGGM work on longer timescales is necessary (I just didn't have that in mind when I started :-( )
Hello Fabien, Thank you for your quick response. My first idea is to change
yrs = nc_ts.time.year
y0, y1 = yrs[0], yrs[-1]
into
time1 = nc_ts.variables['time']
time1 = netCDF4.num2date(time1[:], time1.units)
y0=int(time1[0].strftime('%Y'))
y1=int(time1[-1].strftime('%Y'))
to avoid using pandas.
I don't know yet what to do with these lines:
nc_ts.set_period(t0='{}-10-01'.format(y0), t1='{}-09-01'.format(y1))
gdir.write_monthly_climate_file(time, iprcp, itemp, igrad, ihgt, ref_pix_lon, ref_pix_lat)
So far these are the only lines that has give me problems with the starting date. So hopefully this requires just a small change. I will let you know once I know more about what I what to do with those lines.
Shortly summarised I would like to run OGGM with over the last ~1000 years with different climate simulations of the Community Earth System Model Last Millennium Ensemble (CESM-LME), to look at the influence of natural climate variability on glacier in the Canadian Arctic. Later on in my PhD I would like to use OGGM to make runs over the second half of the Holocene.
gdir.write_monthly_climate_file(time, iprcp, itemp, igrad, ihgt, ref_pix_lon, ref_pix_lat)
Doesn't give a problem with the proper input file. I thought you could set the date of 'days since' yourself, but if you use just 'days since 1801-01-01' than it all works fine.
Shortly summarised I would like to run OGGM with over the last ~1000 years with different climate simulations of the Community Earth System Model Last Millennium Ensemble (CESM-LME)
Right, I see. So there are good news and bad news about that ;-)
Here follows an update on the conversation we had on Slack:
I made a 'process_gcm_data' task in climate.py, which is based on a copy of the 'process_cru_data' task.
` @entity_task(log, writes=['gcm_data']) def process_gcm_data(gdir):
# read the climatology high resolution CL2 climatologies
clfile = utils.get_cru_cl_file()
ncclim = salem.GeoNetcdf(clfile)
# GCM temperature and precipitation data
if not (('gcm_temp_file' in cfg.PATHS) and \
os.path.exists(cfg.PATHS['gcm_temp_file'])):
raise IOError('GCM temp file not found')
if not (('gcm_precc_file' in cfg.PATHS) and \
os.path.exists(cfg.PATHS['gcm_precc_file'])):
raise IOError('GCM precc file not found')
if not (('gcm_precl_file' in cfg.PATHS) and \
os.path.exists(cfg.PATHS['gcm_precl_file'])):
raise IOError('GCM precl file not found')
# read the file
fpathTemp = cfg.PATHS['gcm_temp_file']
fpathPrecc = cfg.PATHS['gcm_precc_file']
fpathPrecl = cfg.PATHS['gcm_precl_file']
temp = salem.open_xr_dataset(fpathTemp)
temp = temp.TREFHT
temp.time
temp = temp[9:-3, :, :] # from normal years to hydrological years
precpC = xr.open_dataset(fpathPrecc)
precpL = xr.open_dataset(fpathPrecl)
precp = precpC.PRECC+precpL.PRECL
precp = precp[9:-3, :, :] # from normal years to hydrological years
precp.time
temp['time'] = pd.period_range('850-10', '2005-9', freq='M')
precp['time'] = pd.period_range('850-10', '2005-9', freq='M')
# Use the year and month for selection
year = np.array([t.year for t in temp.time.values])
month = np.array([t.month for t in temp.time.values])
year = np.array([t.year for t in precp.time.values])
month = np.array([t.month for t in precp.time.values])
time= temp.time
y0=year[0]
y1=year[-1]
ny, r = divmod(len(temp.time), 12)
assert r == 0
# gradient default params
use_grad = cfg.PARAMS['temp_use_local_gradient']
def_grad = cfg.PARAMS['temp_default_gradient']
g_minmax = cfg.PARAMS['temp_local_gradient_bounds']
lon = gdir.cenlon
lat = gdir.cenlat
# This is guaranteed to work because I prepared the file (I hope)
ncclim.set_subset(corners=((lon, lat), (lon, lat)), margin=1)
# get climatology data
loc_hgt = ncclim.get_vardata('elev')
loc_tmp = ncclim.get_vardata('temp')
loc_pre = ncclim.get_vardata('prcp')
loc_lon = ncclim.get_vardata('lon')
loc_lat = ncclim.get_vardata('lat')
# see if the center is ok
if not np.isfinite(loc_hgt[1, 1]):
# take another candidate where finite
isok = np.isfinite(loc_hgt)
# wait: some areas are entirely NaNs, make the subset larger
_margin = 1
while not np.any(isok):
_margin += 1
ncclim.set_subset(corners=((lon, lat), (lon, lat)), margin=_margin)
loc_hgt = ncclim.get_vardata('elev')
isok = np.isfinite(loc_hgt)
if _margin > 1:
log.debug('%s: I had to look up for far climate pixels: %s',
gdir.rgi_id, _margin)
# Take the first candidate (doesn't matter which)
lon, lat = ncclim.grid.ll_coordinates
lon = lon[isok][0]
lat = lat[isok][0]
# Resubset
ncclim.set_subset()
ncclim.set_subset(corners=((lon, lat), (lon, lat)), margin=1)
loc_hgt = ncclim.get_vardata('elev')
loc_tmp = ncclim.get_vardata('temp')
loc_pre = ncclim.get_vardata('prcp')
loc_lon = ncclim.get_vardata('lon')
loc_lat = ncclim.get_vardata('lat')
assert np.isfinite(loc_hgt[1, 1])
isok = np.isfinite(loc_hgt)
hgt_f = loc_hgt[isok].flatten()
assert len(hgt_f) > 0.
ts_grad = np.zeros(12) + def_grad
if use_grad and len(hgt_f) >= 5:
for i in range(12):
loc_tmp_mth = loc_tmp[i, ...][isok].flatten()
slope, _, _, p_val, _ = stats.linregress(hgt_f, loc_tmp_mth)
ts_grad[i] = slope if (p_val < 0.01) else def_grad
# ... but dont exaggerate too much
ts_grad = np.clip(ts_grad, g_minmax[0], g_minmax[1])
# convert to timeserie and hydroyears
ts_grad = ts_grad.tolist()
ts_grad = ts_grad[9:] + ts_grad[0:9]
ts_grad = np.asarray(ts_grad * ny)
latdif=abs((temp.lat)-lat)
latmin=min(latdif)
latsel=np.where(latdif==latmin)
latsel=np.asarray(latsel)
latsel=latsel.squeeze()
temp = temp.isel(lat=((latsel-1), latsel, (latsel+1)))
precp = precp.isel(lat=((latsel-1), latsel, (latsel+1)))
longi = np.hstack((temp.lon[:73], (temp.lon[73:]-360)))
temp['lon'] = longi
precp['lon'] = longi
londif = abs(temp.lon - lon)
lonmin=min(londif)
lonsel=np.where(londif==lonmin)
lonsel=np.asarray(lonsel)
lonsel=lonsel.squeeze()
temp=temp.isel(lon=((lonsel-1), lonsel, (lonsel+1)))
precp = precp.isel(lon=((lonsel-1), lonsel, (lonsel+1)))
Ndays = np.tile([31, 30, 31, 31, 28, 31, 30, 31, 30, 31, 31, 30], (y1 - y0))
for i in range(0,3):
for j in range(0,3):
precp[: ,i, j] = precp[: ,i, j]* (60 * 60 * 24 * 1000) * Ndays # from [m/s] to [kg/m^2 per month]
temp.salem.grid
precp.salem.grid
# compute monthly anomalies
# of temp
tmp = temp
ts_tmp_avg = tmp.isel(time= (year >= 1961) & (year <= 1990)) #time line makes time based selection from the netcdf file.
ts_tmp_avg=ts_tmp_avg.groupby('time.month').mean(dim='time')
ts_tmp = temp.groupby('time.month')-ts_tmp_avg
# of precip
ts_pre_avg = precp.isel(time= (year >= 1961) & (year <= 1990)) #time line makes time based selection from the netcdf file.
ts_pre_avg= ts_pre_avg.groupby('time.month').mean(dim='time')
ts_pre = precp.groupby('time.month')-ts_pre_avg
# interpolate to HR grid
if np.any(~np.isfinite(ts_tmp[:, 1, 1])):
# Extreme case, middle pix is not valid
# take any valid pix from the 3*3 (and hope there's one)
found_it = False
for idi in range(2):
for idj in range(2):
if np.all(np.isfinite(ts_tmp[:, idj, idi])):
ts_tmp[:, 1, 1] = ts_tmp[:, idj, idi]
ts_pre[:, 1, 1] = ts_pre[:, idj, idi]
found_it = True
if not found_it:
msg = '{}: OMG there is no climate data'.format(gdir.rgi_id)
raise RuntimeError(msg)
elif np.any(~np.isfinite(ts_tmp)):
# maybe the side is nan, but we can do nearest
ts_tmp = ncclim.grid.map_gridded_data(ts_tmp.values, temp.salem.grid,
interp='nearest')
ts_pre = ncclim.grid.map_gridded_data(ts_pre.values, precp.salem.grid,
interp='nearest')
else:
# We can do bilinear
ts_tmp = ncclim.grid.map_gridded_data(ts_tmp.values, temp.salem.grid, interp='linear')
ts_pre = ncclim.grid.map_gridded_data(ts_pre.values, precp.salem.grid,
interp='nearest')
# take the center pixel and add it to the CRU CL clim
# for temp
loc_tmp = xr.DataArray(loc_tmp[:, 1, 1], dims=['month'],
coords={'month':ts_tmp_avg.month})
ts_tmp = xr.DataArray(ts_tmp[:, 1, 1], dims=['time'],
coords={'time':time})
ts_tmp = ts_tmp.groupby('time.month') + loc_tmp
# for prcp
loc_pre = xr.DataArray(loc_pre[:, 1, 1], dims=['month'],
coords={'month':ts_pre_avg.month})
ts_pre = xr.DataArray(ts_pre[:, 1, 1], dims=['time'],
coords={'time':time})
ts_pre = ts_pre.groupby('time.month') + loc_pre
# load dates in right format to save
Tindex=salem.GeoNetcdf(fpathTemp, monthbegin=True)
time1 = Tindex.variables['time']
time2 = time1[9:-3] - Ndays # from normal years to hydrological years
print('time1.units', time1.units)
time2 = netCDF4.num2date(time2, time1.units, calendar='noleap')
# done
loc_hgt = loc_hgt[1, 1]
loc_lon = loc_lon[1]
loc_lat = loc_lat[1]
assert np.isfinite(loc_hgt)
assert np.all(np.isfinite(ts_pre.values))
assert np.all(np.isfinite(ts_tmp.values))
assert np.all(np.isfinite(ts_grad))
gdir.write_gcm_data_file(time2, ts_pre.values, ts_tmp.values,
ts_grad, loc_hgt, loc_lon, loc_lat)
ncclim._nc.close()
Tindex._nc.close()
# metadata
out = {'climate_source': 'GCM data', 'hydro_yr_0': y0 + 1, 'hydro_yr_1': y1}
gdir.write_pickle(out, 'climate_info')`
It is still really unflexible, but I thought it is a good moment to share it anyway. To test if it was working I let is overwrite the monthly_climate.nc file, instead of creating a gcm_data.nc file. With the function:
def write_gcm_data_file(self, time, prcp, temp, grad, ref_pix_hgt,
ref_pix_lon, ref_pix_lat):
# overwrite as default
# fpath = self.get_filepath('gcm_data')
fpath = self.get_filepath('climate_monthly')
if os.path.exists(fpath):
os.remove(fpath)
with netCDF4.Dataset(fpath, 'w', format='NETCDF4') as nc:
nc.ref_hgt = ref_pix_hgt
nc.ref_pix_lon = ref_pix_lon
nc.ref_pix_lat = ref_pix_lat
nc.ref_pix_dis = haversine(self.cenlon, self.cenlat,
ref_pix_lon, ref_pix_lat)
dtime = nc.createDimension('time', None)
nc.author = 'OGGM'
nc.author_info = 'Open Global Glacier Model'
timev = nc.createVariable('time','i4',('time',))
timev.setncatts({'units':'days since 0850-01-01 00:00:00'})
timev[:] = netCDF4.date2num([t for t in time],
'days since 0850-01-01 00:00:00')
v = nc.createVariable('prcp', 'f4', ('time',), zlib=True)
v.units = 'kg m-2'
v.long_name = 'total monthly precipitation amount'
v[:] = prcp
v = nc.createVariable('temp', 'f4', ('time',), zlib=True)
v.units = 'degC'
v.long_name = '2m temperature at height ref_hgt'
v[:] = temp
v = nc.createVariable('grad', 'f4', ('time',), zlib=True)
v.units = 'degC m-1'
v.long_name = 'temperature gradient'
v[:] = grad``
Which is almost a complete copy of 'write_monthly_climate_file', and there for I put it just below that function in utils.py.
In order to be able to save the file as gcm_data.nc, I added 2 line in the cfg.py:
_doc = 'The monthly GCM climate timeseries for this glacier, stored in a netCDF ' \ 'file.' BASENAMES['gcm_data'] = ('gcm_data.nc', _doc)
And for convenience I added the following line in the task.py:
from oggm.core.preprocessing.climate import process_gcm_data
So this is the status for now. For it to make sense to include it in the code base I should make it more flexible I think. Next to that there might be some other things that need to be changed. I don't know if it would be better to discuss this further here or leave it for the oggm workshop, both is fine for me.
Thanks for your work! Here an answer to your question:
For it to make sense to include it in the code base I should make it more flexible I think.
Not necessarily. Currently this function would be used by you and only you, and there is a rule in programing: don't try to solve problem you don't know about (yet). (this goes together with "if it's not broken, don't fix it") ;-)
So I would welcome a PR where we can discuss your code in more detail. Whether or not we will manage to merge this before the workshop I don't know, but at least we can get started.
That is a funny rule, but it indeed makes sense. I will make a PR on Monday :)
Have a nice weekend!
When my climate file starts in 1600 (or earlier) the years are not recognized as years any more.
yrs = nc_ts.time.year
AttributeError: 'Index' object has no attribute 'year'
I'm planning to make some changes in my fork, to solve this problem. Would it make sense to make a pull request of these chances later on?