ECMWFCode4Earth / ml_drought

Machine learning to better predict and understand drought. Moving github.com/ml-clim
https://ml-clim.github.io/drought-prediction/
89 stars 18 forks source link

Review - Feature selection #98

Closed cvitolo closed 4 years ago

cvitolo commented 4 years ago
tommylees112 commented 4 years ago

You initially requested a large variety of data. Could you please give details of feature selection? What variables did you decide to use and why did you not use the others?

We are currently using [precipitation, VHI, evaporation, soil moisture, temperature] to predict VHI. The current data are from different sources including CHIRPS for precipitation and GLEAM for soil moisture / evaporation because of export times. We are downloading alternative data from ERA5 (temperature, evaporation).

Feature selection can be done using Shapley values to test the relative contribution of each variable to the pixel prediction.

It would also be great to have a look-up table to link variable name as used by the CDS/MARS and the variable name used in your notebooks.

The variable names should be the same since the parameter variable has the same name as the call to the cdsapi when you build a request using the CDS Website.

The variables with non-standard names (e.g. VHI) are likely from alternative data sources.