DOI-BOR / PyForecast

PyForecast is a statistical modeling tool used by Reclamation water managers and reservoir operators to train and build predictive models for seasonal inflows and streamflows. PyForecast allows users to make current water-year forecasts using models developed with the program.
Other
28 stars 12 forks source link

Linearity assumption-transformations and residuals #21

Open jslanini opened 5 years ago

jslanini commented 5 years ago

One of the fundamental assumptions of linear regression is... linearity. We need to provide the ability for users to test that the models meet this assumption. A frequent approach is to plot predicted values (x-axis) vs residuals (y-axis). The residuals will be random if the assumption of linearity is true.

Streamflow data are frequently not normally distributed. A common distribution (and transformation in forecasting) is lognormal. I suggest we implement transformations, starting with lognormal. NRCS also uses square and cubic transformations, but I'm not sure as to the reasoning behind these transformations.

tjrocha commented 5 years ago

Another thing we can consider is the ability to check for stationarity within the input datasets. I can add this on the new UI that I added to the Data tab. Maybe have another button on there to perform a stationarity test with an Augmented Dickey-Fuller test.

image

tjrocha commented 5 years ago

So to do the stationarity test, it would be easiest if we were to use the statsmodels module in Python. This module also has some built-in methods to do the linearity-check that this issue calls for. I will be adding this module to the pn-development branch for this issue.

jslanini commented 5 years ago

@kevinfol and I discussed stationarity in terms of forecasting awhile back. I'm not really how to account for it. If April 1 snowpack is declining, does it directly result in lower runoff, or is there a shift in the form of precipitation (rain v. snow)? In other words, does the forecast equation already account for the trend, or do we need to account for it? Also, I'm not familiar with the ADF but have used Mann-Kendall in the past to analyze trends.