alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
https://www.alkaline-ml.com/pmdarima
MIT License
1.58k stars 234 forks source link

Feature Request: BoxCox Transformation #114

Closed rohan-gt closed 5 years ago

rohan-gt commented 5 years ago

Can a parameter be included to pre-process multiplicative or non-normal time series using the BoxCox transformation where the BoxCox lambda can be automatically detected on the training set and stored with the model object for the inverse transform of the predictions?

tgsmith61591 commented 5 years ago

I do think that this would be a valuable feature. However, I think it would have to be implemented a bit differently than you've proposed... if we go down the path of adding arbitrary transformations to the auto_arima function itself, what happens when the next person makes a case for why they want Fourier transforms included (See issue #103, for instance)? And then log transformations? Etc.

Best to probably create a new set of pre-processing objects, similar to scikit-learn's TransformerMixin, that will fit and store the lambda parameter, and can inverse transform predictions. That way, users could theoretically stack any number of pre/post-processors together.

Thoughts? CC @aaronreidsmith & @charlesdrotar

rohan-gt commented 5 years ago

I'm in favour of the addition of the Fourier transform too haha. The problem with the pre-processing occurring externally is that information like the BoxCox lambda or the no. of Fourier terms and levels of Fourier transforms (for multiple seasonality) have to be stored separately rather than within the model object. This makes it hard to simply import a pickled model object to predict future values and requires the user to manually use the inverse transform by importing the BoxCox lambda value or even reconstructing the multiple Fourier series to be fed as exogenous variables when both these things can be learned from the training set and processed internally

tgsmith61591 commented 5 years ago

This is why scikit-learn added the Pipeline object, and we'd have to do something similar so that you could deserialize and forecast in one shot.

I am strongly opposed against cramming pre-processing stages into the function call. Adding lambda would make it truer to the R version, but at the cost of more args and more branching logic.

tgsmith61591 commented 5 years ago

Plus, to do it the way you're asking, we'd also have to force this into the ARIMA class itself. Best to keep logically separate processes separate from one another.

tgsmith61591 commented 5 years ago

121 adds the pipeline we discussed as well as a Fourier exogenous featurizer. These will be present in v1.2.0