MLBazaar / MLPrimitives

Primitives for machine learning and data science.
https://mlbazaar.github.io/MLPrimitives
MIT License
70 stars 38 forks source link

Add new primitive: Arima model #168

Closed AlexanderGeiger closed 5 years ago

AlexanderGeiger commented 5 years ago

Description

ARIMA models are often used to describe time series data. Therefore we should add an Arima primitive for time series forecasting. We can use the statsmodels library.

The primitive takes an array as input, on which an Arima model is fitted. The forecast method returns another array of predictions.

What I Did

I started implementing this primitive for testing purposes in the arima branch on my fork, which you can check out. Concretely, I included an adapter for statsmodels and a Primitive JSON file.

Any feedback on the primitive itself and the implementation would be highly appreciated.

csala commented 5 years ago

Thanks for opening the issue @AlexanderGeiger, but I have some concerns about the proposed approach and implementation.

My understanding of the ARIMA model is that it is fitted on a single sequence of datapoints (a single time series) and then it is used to forecast one or more values immediately after that sequence which it has been fitted to.

On the contrary, models like an LSTM based regressor are fitted on both a collection of sequences of data points and the values after them and later on they can be used to forecast the values after any new sequence of data points that they are given.

In order words, even though the ARIMA class has a fit method, I think that it does not really have a learning stage equivalent to the one that models like LSTM have and that the ARIMA adapter should be a stateless function (or method) instead of a class with both a fit and a predict method.

So, when it comes to the exact implementation, I think the class prototype should be something like this:


from statsmodels.tsa import arima_model

class ARIMA:

    def __init__(self, ...):
        # Capture here all the arguments that are actually hyperparameters
        # and store them as class attributes.

    def forecast(self, X):
        ...  # manipulate X as necessary here
        arima = arima_model.ARIMA(X, ...)
        arima_results = arima.fit(...)
        y = arima_results.forecast(...)
        ... # manipulate y as necessary here
        return y

then, at the JSON level, you would specify only the predict method:

{
    ...
    "name": "statsmodels.tsa.arima_model.ARIMA",
    "primitive": "mlprimitives.adapters.statsmodels.ARIMA",
    "produce": {
        "method": "forecast",
        ...
    },
    ...
}

Also note the name of the class: when writing an adapter, and as far as it possible, make the name of the class or function be exactly the same as the one that you are adapting. So, in this case, the adapter class should be called ARIMA instead of arima_model

AlexanderGeiger commented 5 years ago

Hi @csala , Thank you for the feedback! I changed the structure of the Arima primitive accordingly. The primitive now takes an array of time series data as input and produces another array of predictions. The number of predictions is defined in the steps hyperparameter. The json can be found here and the Arima class here. Please let me know if that is the approach you had in mind.

csala commented 5 years ago

Hi @AlexanderGeiger Yes! Now it looks exactly as I had thought about it. :-)

It would be great to also add a pipeline that uses it, but I'm afraid there is no timeseries dataset yet in the repo, so perhaps you can create the PR without the pipeline and we can add it later when we add the dataset.