ECMWFCode4Earth / ml_drought

Machine learning to better predict and understand drought. Moving github.com/ml-clim
https://ml-clim.github.io/drought-prediction/
92 stars 18 forks source link

WIP: Preprocessing #21

Closed gabrieltseng closed 5 years ago

gabrieltseng commented 5 years ago

Preprocessing the CHIRPS dataset (and other datasets so they play well with CHIRPS)

tommylees112 commented 5 years ago

Screenshot 2019-05-29 at 23 53 40

Can we just run some checks/tests that the output data is coming out in a sensible shape - it may just have been that my data wasn't in the right format (i took an already preprocessed dataset (without the regridding) and tried to apply the new methods to it.

But that might have caused these issues. But want to check that's not going to happen to future preprocessed data in the future

tommylees112 commented 5 years ago

May have found our problem lol:

Screenshot 2019-05-30 at 00 44 56 data/raw/vhi/1981/VHP.G04.C07.NC.P1981035.VH.nc

d = xr.open_dataset('data/raw/vhi/1981/VHP.G04.C07.NC.P1981035.VH.nc')
Out[43]:
<xarray.Dataset>
Dimensions:  (HEIGHT: 3616, WIDTH: 10000)
Dimensions without coordinates: HEIGHT, WIDTH
Data variables:
    VCI      (HEIGHT, WIDTH) float32 ...
    TCI      (HEIGHT, WIDTH) float32 ...
    VHI      (HEIGHT, WIDTH) float32 ...
    QA       (HEIGHT, WIDTH) float32 ...
Attributes:
    VERSION:                 VH (vh.exe,version 1.3, March 21 2012)
    SATELLITE:               NC
    INSTRUMENT:              AVHRR
    CITATION_TO_DOCUMENTS:   User Guide of Vegetation Health(VH) system (vers...
    CONTACT:                 NOAA/NESDIS/STAR/EMB
    PRODUCT_NAME:            Vegetation Health
    PROJECTION:              Plate_Carree
    DATE_BEGIN:              239
    DATE_END:                245
    TIME_BEGIN:              00:00 UTC (use day time data only)
    TIME_END:                23:59 UTC (use day time data only)
    ANCILLARY_FILES:         FILE_CONFIGURE:vh.config_NN\nFILE_PRELAUNCH_CALI...
    CONFIGURE_FILE_CONTENT:  [Options for vh.exe]\nDIR_Ancillary=            ...
    YEAR:                    1981
    PERIOD_OF_YEAR:          35
    DAYS_PER_PERIOD:         7
    END_LATITUDE_RANGE:      -55.152
    START_LONGITUDE_RANGE:   -180.0
    START_LATITUDE_RANGE:    75.024
    END_LONGITUDE_RANGE:     180.0
    INPUT_FILES:             2
    INPUT_FILENAMES:         data/AVHRR_VHP/4km/VH/VHP.G04.C07.NC.P1981035.SM...
d.VHI.plot()
tommylees112 commented 5 years ago

Fix

netcdf_filepath = 'data/raw/vhi/1981/VHP.G04.C07.NC.P1981035.VH.nc'
ds = xr.open_dataset(netcdf_filepath)
![Screenshot 2019-05-30 at 00 53 20](https://user-images.githubusercontent.com/21049064/58598722-66439380-8275-11e9-84c4-2462e8ecb8bf.png)

ds = ds.sortby('HEIGHT', ascending=False)
ds.VHI.plot()

Screenshot 2019-05-30 at 00 53 20