DOI-BOR / PyForecast

PyForecast is a statistical modeling tool used by Reclamation water managers and reservoir operators to train and build predictive models for seasonal inflows and streamflows. PyForecast allows users to make current water-year forecasts using models developed with the program.
Other
28 stars 12 forks source link

Backward model selection issues #36

Open jslanini opened 5 years ago

jslanini commented 5 years ago

Creates models with a ton of variables. Crashes when models are selected.

jslanini commented 5 years ago

Note: with PCA regression

jslanini commented 5 years ago

z score also crashes

tjrocha commented 5 years ago

Can confirm with hebgen.fcst file. Looking into it...

Previous tests with a small number of predictors looked like it was working ok and even coming up with the same solutions as the Forward-Selection algorithm.

tjrocha commented 5 years ago

Creates models with a ton of variables

  • After digging into the code and stepping through the process, it seems likely that the backward feature selection (and the forward feature selection algorithm for that matter) seems to be settling into a local min/max depending on which model scoring method is selected. The code attempts to dig out of these local optimal spots near the end of the selection process by removing 1 and adding 2 predictors in the case of the forward selection algorithm and adding 1 and removing 2 predictors in the case of the backward selection algorithm. For sufficiently large predictor sets however it may be the case that adding/removing even 2 at a time may not be sufficient to dig out of the local min/max. As a result, there may be a bias towards less features (predictors) in the forward selection case and a bias for more features in the backwards selection case.

Crashes when models are selected.

  • Still looking into this...
tjrocha commented 5 years ago

Crashing bug is fixed.