ECMWFCode4Earth / ml_drought

Machine learning to better predict and understand drought. Moving github.com/ml-clim
https://ml-clim.github.io/drought-prediction/
90 stars 18 forks source link

era5POS preprocessor #48

Open tommylees112 opened 5 years ago

tommylees112 commented 5 years ago

the era5POS data when converted to monthly doesn't necessarily make sense to just upsample from hourly to monthly

    @staticmethod
    def resample_time(ds: xr.Dataset,
                      resample_length: str = 'M',
                      upsampling: bool = False) -> xr.Dataset:

        # TODO: would be nice to programmatically get upsampling / not
        ds = ds.sortby('time')

        resampler = ds.resample(time=resample_length)

        if not upsampling:
            return resampler.mean()
        else:
            return resampler.nearest()

Because it's then Mean HOURLY precipitation over a month - when really we should be summing over the month.

...
        if not upsampling:
            return resampler.mean()
        elif data == 'era5POS':
             return resampler.sum()
        else:
            return resampler.nearest()

Otherwise we get extremely different spatial patterns:

CHIRPS (mm/month)

Screenshot 2019-06-27 at 12 16 37

ERA5POS (mm/hour mean for each month)

Screenshot 2019-06-27 at 12 16 42

gabrieltseng commented 5 years ago

Isn't this okay if we normalize (which we do)?

tommylees112 commented 5 years ago

hmmm yeah you would think so actually. but the spatial patterns would remain the same and they're so vastly different i'm just a little concerned that we haven't done the preprocessing correctly!

gabrieltseng commented 5 years ago

Are they so different? They have extremely different resolutions (this is only resolved at the engineering step), but the peaks in precipitation look like they are in roughly the same spot, as well as the troughs.