ECMWFCode4Earth / ml_drought

Machine learning to better predict and understand drought. Moving github.com/ml-clim
https://ml-clim.github.io/drought-prediction/
89 stars 18 forks source link

preprocess/adede_data #151

Closed tommylees112 closed 4 years ago

tommylees112 commented 4 years ago

Create the DataFrame object like the Adede Paper. Allowing us to reproduce their results and allowing us to create useful tabular outputs for other modellers/users of the pipeline.

Alternatively would also be good to write explicitly to a format that our models can use

Also, can we get the engineer to take in xarray objects directly (instead of the self.get_dataset) we could pass a xr.Dataset directly

1) create the xr.Dataset of the raw variables used in the Adede paper

In [113]: out_ds
Out[113]:
<xarray.Dataset>
Dimensions:  (lat: 45, lon: 35, time: 447)
Coordinates:
  * time     (time) datetime64[ns] 1981-10-31 1981-11-30 ... 2018-12-31
  * lat      (lat) float64 -5.0 -4.75 -4.5 -4.25 -4.0 ... 5.0 5.25 5.5 5.75 6.0
  * lon      (lon) float64 33.75 34.0 34.25 34.5 34.75 ... 41.5 41.75 42.0 42.25
Data variables:
    RCI1M    (time, lat, lon) float64 3.321 3.28 1.896 1.773 ... nan nan nan nan
    RCI3M    (time, lat, lon) float64 4.824 5.148 5.119 4.887 ... nan nan nan
    RFE1M    (time, lat, lon) float64 11.37 10.98 7.303 6.162 ... nan nan nan
    RFE3M    (time, lat, lon) float64 11.72 11.91 12.18 10.92 ... nan nan nan
    SPI1     (time, lat, lon) float64 -0.5489 -0.4763 -0.4513 ... nan nan nan
    SPI3     (time, lat, lon) float64 -0.4601 -0.4373 -0.401 ... nan nan nan
    VCI1M    (time, lat, lon) float64 50.11 15.01 87.42 ... 21.15 60.27 22.19
    VCI3M    (time, lat, lon) float64 49.11 15.0 63.55 61.26 ... nan nan nan nan

2) Create a pd.DataFrame of the timeseries for each of the regions

CHANGES:

1) The script in scripts/drafts/adede_variables.py used for calculating the above 2) New index in src/analysis/indices/condition_index.py 3) update comments in other files 4) fix the tests for SPI index

TODO