This overhauls many aspects of the models code. It greatly simplifies the code, and unifies things across HODMD and ARIMA. It also fixes one or two small bugs found elsewhere while working on this.
The simplification comes at a cost: This removes some more advanced features of the models. I consider this a worthwhile trade off because those advanced features were somewhat specific to our old farming use case, or otherwise either not very relevant or very important to DTBase as a base package. The lost major features are:
The models.utils.dataprocessor.clean_data module implemented quite a bespoke data downsampling method, for binning time series data and taking averages. This has been replaced by a one-liner call to resample from pandas. It doesn't do exactly the same thing, but I think it does an equivalent job.
The models.utils.dataprocessor.prepare_data did some semicomplex timeseries imputation for missing data. It looked at previous data points at daily, weekly, and custom length intervals, and took an average of those to impute missing values. So for instance, a missing value on a Monday morning would be the average of previous values on Monday mornings. While useful for some applications, the time periods were hard-coded, and wouldn't have been appropriate for many use cases. The optimal thing would have been to generalise this code to handle any periodicity specified by the user, but I didn't have time to do that. Instead the new code simply does linear interpolation to impute.
HODMD had two ways of running it: Single-measure, in which every sensor measure was forecasted separately, and multi-measure, were multiple measures reported by one sensor were forecasted together in one run of HODMD. I had to axe the multi-measure mode because I don't have time to implement it nicely with the new code structure, but would like to restore it, it seemed useful. It could also be generalised so that it could forecast in a single go data from multiple sensors.
The .ini config files for model parameters are gone. Now that preprocessing is simpler, they seemed like overkill.
Many model parameters that used to have default values, such as a default forecasting period and default sensor to forecast for, are now compulsory to provide when calling the model.
This partially addresses #225, although more work could be done, so let's leave it open for now.
Builds on #265, hence the massive diff.
This overhauls many aspects of the models code. It greatly simplifies the code, and unifies things across HODMD and ARIMA. It also fixes one or two small bugs found elsewhere while working on this.
The simplification comes at a cost: This removes some more advanced features of the models. I consider this a worthwhile trade off because those advanced features were somewhat specific to our old farming use case, or otherwise either not very relevant or very important to DTBase as a base package. The lost major features are:
models.utils.dataprocessor.clean_data
module implemented quite a bespoke data downsampling method, for binning time series data and taking averages. This has been replaced by a one-liner call toresample
from pandas. It doesn't do exactly the same thing, but I think it does an equivalent job.models.utils.dataprocessor.prepare_data
did some semicomplex timeseries imputation for missing data. It looked at previous data points at daily, weekly, and custom length intervals, and took an average of those to impute missing values. So for instance, a missing value on a Monday morning would be the average of previous values on Monday mornings. While useful for some applications, the time periods were hard-coded, and wouldn't have been appropriate for many use cases. The optimal thing would have been to generalise this code to handle any periodicity specified by the user, but I didn't have time to do that. Instead the new code simply does linear interpolation to impute..ini
config files for model parameters are gone. Now that preprocessing is simpler, they seemed like overkill.This partially addresses #225, although more work could be done, so let's leave it open for now.