Time limit in process_custom_climate_data

anoukvlug commented 7 years ago

When my climate file starts in 1600 (or earlier) the years are not recognized as years any more.

yrs = nc_ts.time.year AttributeError: 'Index' object has no attribute 'year'

I'm planning to make some changes in my fork, to solve this problem. Would it make sense to make a pull request of these chances later on?

fmaussion commented 7 years ago

Hi Anouk,

This is getting quite interesting, I'd like to know more about your plans ;-) But first, to your question:

I expect that solving this problem won't be trivial, as there is an underlying problem in pandas and xarray (see e.g. https://github.com/pydata/xarray/issues/789 for xarray, which relies on pandas: https://github.com/pandas-dev/pandas/issues/7307 ).

OGGM does not rely heavily on pandas datetimes so I think we could solve this problem without getting burned in a hell of changes, but on the top of my head I can't really make a prognostic about how much change this would imply.

In fact, the dynamical model of OGGM already relies on a very simple time system (https://github.com/OGGM/oggm/blob/master/oggm/utils.py#L675-L718), which could be meliorated and extended (based on other peoples systems, as I hope we don't have to reinvent the wheel here).

I know this is not very helpful, but yes a change that would make OGGM work on longer timescales is necessary (I just didn't have that in mind when I started :-( )

anoukvlug commented 7 years ago

Hello Fabien, Thank you for your quick response. My first idea is to change

yrs = nc_ts.time.year
y0, y1 = yrs[0], yrs[-1]

into

time1 = nc_ts.variables['time']
time1 = netCDF4.num2date(time1[:], time1.units)
y0=int(time1[0].strftime('%Y'))
y1=int(time1[-1].strftime('%Y'))

to avoid using pandas.

I don't know yet what to do with these lines: nc_ts.set_period(t0='{}-10-01'.format(y0), t1='{}-09-01'.format(y1)) gdir.write_monthly_climate_file(time, iprcp, itemp, igrad, ihgt, ref_pix_lon, ref_pix_lat)

So far these are the only lines that has give me problems with the starting date. So hopefully this requires just a small change. I will let you know once I know more about what I what to do with those lines.

Shortly summarised I would like to run OGGM with over the last ~1000 years with different climate simulations of the Community Earth System Model Last Millennium Ensemble (CESM-LME), to look at the influence of natural climate variability on glacier in the Canadian Arctic. Later on in my PhD I would like to use OGGM to make runs over the second half of the Holocene.

anoukvlug commented 7 years ago

gdir.write_monthly_climate_file(time, iprcp, itemp, igrad, ihgt, ref_pix_lon, ref_pix_lat)

Doesn't give a problem with the proper input file. I thought you could set the date of 'days since' yourself, but if you use just 'days since 1801-01-01' than it all works fine.

fmaussion commented 7 years ago

Shortly summarised I would like to run OGGM with over the last ~1000 years with different climate simulations of the Community Earth System Model Last Millennium Ensemble (CESM-LME)

Right, I see. So there are good news and bad news about that ;-)

the good news is, you don't have to change anything in the existing code. Indeed, OGGM will follow Marzeion et al (2012) in how the GCM runs will be done: the calibration will still happen with observational data (CRU, or whatever is best), and the runs will be done with "downscaled" GCM data (in a first step, using the delta method). This is important because some pre-processing steps (like the bed inversion) require realistic climate data first
the bad news is, there is no mechanism to apply the delta method to GCM data in OGGM yet. I expect this to be quite easy to implement though, and this would be a very good improvement to the codebase

anoukvlug commented 7 years ago

Here follows an update on the conversation we had on Slack:

I made a 'process_gcm_data' task in climate.py, which is based on a copy of the 'process_cru_data' task.

` @entity_task(log, writes=['gcm_data']) def process_gcm_data(gdir):

# read the climatology high resolution CL2 climatologies
clfile = utils.get_cru_cl_file()
ncclim = salem.GeoNetcdf(clfile)

# GCM temperature and precipitation data
if not (('gcm_temp_file' in cfg.PATHS) and \
                os.path.exists(cfg.PATHS['gcm_temp_file'])):
    raise IOError('GCM temp file not found')

if not (('gcm_precc_file' in cfg.PATHS) and \
                os.path.exists(cfg.PATHS['gcm_precc_file'])):
    raise IOError('GCM precc file not found')

if not (('gcm_precl_file' in cfg.PATHS) and \
                os.path.exists(cfg.PATHS['gcm_precl_file'])):
    raise IOError('GCM precl file not found')

# read the file
fpathTemp = cfg.PATHS['gcm_temp_file']
fpathPrecc = cfg.PATHS['gcm_precc_file']
fpathPrecl = cfg.PATHS['gcm_precl_file']
temp = salem.open_xr_dataset(fpathTemp)
temp = temp.TREFHT
temp.time 
temp = temp[9:-3, :, :] # from normal years to hydrological years
precpC = xr.open_dataset(fpathPrecc)
precpL = xr.open_dataset(fpathPrecl)
precp = precpC.PRECC+precpL.PRECL
precp = precp[9:-3, :, :] # from normal years to hydrological years

precp.time 
temp['time'] = pd.period_range('850-10', '2005-9', freq='M') 
precp['time'] = pd.period_range('850-10', '2005-9', freq='M')

# Use the year and month for selection
year = np.array([t.year for t in temp.time.values])  
month = np.array([t.month for t in temp.time.values]) 

year = np.array([t.year for t in precp.time.values]) 
month = np.array([t.month for t in precp.time.values])  
time= temp.time

y0=year[0]
y1=year[-1]

ny, r = divmod(len(temp.time), 12)
assert r == 0

# gradient default params
use_grad = cfg.PARAMS['temp_use_local_gradient']
def_grad = cfg.PARAMS['temp_default_gradient']
g_minmax = cfg.PARAMS['temp_local_gradient_bounds']

lon = gdir.cenlon
lat = gdir.cenlat

# This is guaranteed to work because I prepared the file (I hope)
ncclim.set_subset(corners=((lon, lat), (lon, lat)), margin=1)

# get climatology data
loc_hgt = ncclim.get_vardata('elev')
loc_tmp = ncclim.get_vardata('temp')
loc_pre = ncclim.get_vardata('prcp')
loc_lon = ncclim.get_vardata('lon')
loc_lat = ncclim.get_vardata('lat')

# see if the center is ok
if not np.isfinite(loc_hgt[1, 1]):
    # take another candidate where finite
    isok = np.isfinite(loc_hgt)

    # wait: some areas are entirely NaNs, make the subset larger
    _margin = 1
    while not np.any(isok):
        _margin += 1
        ncclim.set_subset(corners=((lon, lat), (lon, lat)), margin=_margin)
        loc_hgt = ncclim.get_vardata('elev')
        isok = np.isfinite(loc_hgt)
    if _margin > 1:
        log.debug('%s: I had to look up for far climate pixels: %s',
                  gdir.rgi_id, _margin)

    # Take the first candidate (doesn't matter which)
    lon, lat = ncclim.grid.ll_coordinates
    lon = lon[isok][0]
    lat = lat[isok][0]
    # Resubset
    ncclim.set_subset()
    ncclim.set_subset(corners=((lon, lat), (lon, lat)), margin=1)
    loc_hgt = ncclim.get_vardata('elev')
    loc_tmp = ncclim.get_vardata('temp')
    loc_pre = ncclim.get_vardata('prcp')
    loc_lon = ncclim.get_vardata('lon')
    loc_lat = ncclim.get_vardata('lat')

assert np.isfinite(loc_hgt[1, 1])
isok = np.isfinite(loc_hgt)
hgt_f = loc_hgt[isok].flatten()
assert len(hgt_f) > 0.
ts_grad = np.zeros(12) + def_grad
if use_grad and len(hgt_f) >= 5:
    for i in range(12):
        loc_tmp_mth = loc_tmp[i, ...][isok].flatten()
        slope, _, _, p_val, _ = stats.linregress(hgt_f, loc_tmp_mth)
        ts_grad[i] = slope if (p_val < 0.01) else def_grad
# ... but dont exaggerate too much
ts_grad = np.clip(ts_grad, g_minmax[0], g_minmax[1])
# convert to timeserie and hydroyears
ts_grad = ts_grad.tolist()
ts_grad = ts_grad[9:] + ts_grad[0:9]
ts_grad = np.asarray(ts_grad * ny)

latdif=abs((temp.lat)-lat)
latmin=min(latdif)
latsel=np.where(latdif==latmin)
latsel=np.asarray(latsel)
latsel=latsel.squeeze()
temp = temp.isel(lat=((latsel-1), latsel, (latsel+1))) 
precp = precp.isel(lat=((latsel-1), latsel, (latsel+1)))

longi = np.hstack((temp.lon[:73], (temp.lon[73:]-360)))
temp['lon'] = longi
precp['lon'] = longi
londif = abs(temp.lon - lon)
lonmin=min(londif)
lonsel=np.where(londif==lonmin)
lonsel=np.asarray(lonsel)
lonsel=lonsel.squeeze()
temp=temp.isel(lon=((lonsel-1), lonsel, (lonsel+1))) 
precp = precp.isel(lon=((lonsel-1), lonsel, (lonsel+1)))

Ndays = np.tile([31, 30, 31, 31, 28, 31, 30, 31, 30, 31, 31, 30], (y1 - y0))
for i in range(0,3):
    for j in range(0,3):
        precp[: ,i, j] = precp[: ,i, j]* (60 * 60 * 24 * 1000) * Ndays # from [m/s] to [kg/m^2 per month]

temp.salem.grid
precp.salem.grid

# compute monthly anomalies
# of temp
tmp = temp
ts_tmp_avg = tmp.isel(time= (year >= 1961) & (year <= 1990))  #time line makes time based selection from the netcdf file.
ts_tmp_avg=ts_tmp_avg.groupby('time.month').mean(dim='time')
ts_tmp = temp.groupby('time.month')-ts_tmp_avg

# of precip
ts_pre_avg = precp.isel(time= (year >= 1961) & (year <= 1990))  #time line makes time based selection from the netcdf file.
ts_pre_avg= ts_pre_avg.groupby('time.month').mean(dim='time')
ts_pre = precp.groupby('time.month')-ts_pre_avg

# interpolate to HR grid
if np.any(~np.isfinite(ts_tmp[:, 1, 1])):
    # Extreme case, middle pix is not valid
    # take any valid pix from the 3*3 (and hope there's one)
    found_it = False
    for idi in range(2):
        for idj in range(2):
            if np.all(np.isfinite(ts_tmp[:, idj, idi])):
                ts_tmp[:, 1, 1] = ts_tmp[:, idj, idi]
                ts_pre[:, 1, 1] = ts_pre[:, idj, idi]
                found_it = True
    if not found_it:
        msg = '{}: OMG there is no climate data'.format(gdir.rgi_id)
        raise RuntimeError(msg)
elif np.any(~np.isfinite(ts_tmp)):
    # maybe the side is nan, but we can do nearest
    ts_tmp = ncclim.grid.map_gridded_data(ts_tmp.values, temp.salem.grid,
                                          interp='nearest')
    ts_pre = ncclim.grid.map_gridded_data(ts_pre.values, precp.salem.grid,
                                          interp='nearest')
else:
    # We can do bilinear
    ts_tmp = ncclim.grid.map_gridded_data(ts_tmp.values, temp.salem.grid, interp='linear')

    ts_pre = ncclim.grid.map_gridded_data(ts_pre.values, precp.salem.grid,
                                          interp='nearest')

# take the center pixel and add it to the CRU CL clim
# for temp
loc_tmp = xr.DataArray(loc_tmp[:, 1, 1], dims=['month'],
                       coords={'month':ts_tmp_avg.month})
ts_tmp = xr.DataArray(ts_tmp[:, 1, 1], dims=['time'],
                       coords={'time':time})
ts_tmp = ts_tmp.groupby('time.month') + loc_tmp
# for prcp
loc_pre = xr.DataArray(loc_pre[:, 1, 1], dims=['month'],
                       coords={'month':ts_pre_avg.month})
ts_pre = xr.DataArray(ts_pre[:, 1, 1], dims=['time'],
                       coords={'time':time})
ts_pre = ts_pre.groupby('time.month') + loc_pre

# load dates in right format to save
Tindex=salem.GeoNetcdf(fpathTemp, monthbegin=True)
time1 = Tindex.variables['time']
time2 = time1[9:-3] - Ndays # from normal years to hydrological years
print('time1.units', time1.units)
time2 = netCDF4.num2date(time2, time1.units, calendar='noleap')

# done
loc_hgt = loc_hgt[1, 1]
loc_lon = loc_lon[1]
loc_lat = loc_lat[1]
assert np.isfinite(loc_hgt)
assert np.all(np.isfinite(ts_pre.values))
assert np.all(np.isfinite(ts_tmp.values))
assert np.all(np.isfinite(ts_grad))
gdir.write_gcm_data_file(time2, ts_pre.values, ts_tmp.values,
                                ts_grad, loc_hgt, loc_lon, loc_lat)

ncclim._nc.close()
Tindex._nc.close()

# metadata
out = {'climate_source': 'GCM data', 'hydro_yr_0': y0 + 1, 'hydro_yr_1': y1}
gdir.write_pickle(out, 'climate_info')`

It is still really unflexible, but I thought it is a good moment to share it anyway. To test if it was working I let is overwrite the monthly_climate.nc file, instead of creating a gcm_data.nc file. With the function:

def write_gcm_data_file(self, time, prcp, temp, grad, ref_pix_hgt,
                               ref_pix_lon, ref_pix_lat):

    # overwrite as default
    # fpath = self.get_filepath('gcm_data')
    fpath = self.get_filepath('climate_monthly')
    if os.path.exists(fpath):
        os.remove(fpath)

    with netCDF4.Dataset(fpath, 'w', format='NETCDF4') as nc:
        nc.ref_hgt = ref_pix_hgt
        nc.ref_pix_lon = ref_pix_lon
        nc.ref_pix_lat = ref_pix_lat
        nc.ref_pix_dis = haversine(self.cenlon, self.cenlat,
                                   ref_pix_lon, ref_pix_lat)

        dtime = nc.createDimension('time', None)

        nc.author = 'OGGM'
        nc.author_info = 'Open Global Glacier Model'

        timev = nc.createVariable('time','i4',('time',))
        timev.setncatts({'units':'days since 0850-01-01 00:00:00'})
        timev[:] = netCDF4.date2num([t for t in time],
                                    'days since 0850-01-01 00:00:00')

        v = nc.createVariable('prcp', 'f4', ('time',), zlib=True)
        v.units = 'kg m-2'
        v.long_name = 'total monthly precipitation amount'
        v[:] = prcp

        v = nc.createVariable('temp', 'f4', ('time',), zlib=True)
        v.units = 'degC'
        v.long_name = '2m temperature at height ref_hgt'
        v[:] = temp

        v = nc.createVariable('grad', 'f4', ('time',), zlib=True)
        v.units = 'degC m-1'
        v.long_name = 'temperature gradient'
        v[:] = grad``

Which is almost a complete copy of 'write_monthly_climate_file', and there for I put it just below that function in utils.py.

In order to be able to save the file as gcm_data.nc, I added 2 line in the cfg.py:

_doc = 'The monthly GCM climate timeseries for this glacier, stored in a netCDF ' \ 'file.' BASENAMES['gcm_data'] = ('gcm_data.nc', _doc)

And for convenience I added the following line in the task.py:

from oggm.core.preprocessing.climate import process_gcm_data

So this is the status for now. For it to make sense to include it in the code base I should make it more flexible I think. Next to that there might be some other things that need to be changed. I don't know if it would be better to discuss this further here or leave it for the oggm workshop, both is fine for me.

fmaussion commented 7 years ago

Thanks for your work! Here an answer to your question:

For it to make sense to include it in the code base I should make it more flexible I think.

Not necessarily. Currently this function would be used by you and only you, and there is a rule in programing: don't try to solve problem you don't know about (yet). (this goes together with "if it's not broken, don't fix it") ;-)

So I would welcome a PR where we can discuss your code in more detail. Whether or not we will manage to merge this before the workshop I don't know, but at least we can get started.

anoukvlug commented 7 years ago

That is a funny rule, but it indeed makes sense. I will make a PR on Monday :)

Have a nice weekend!

OGGM / oggm

Time limit in process_custom_climate_data #157