alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
https://www.alkaline-ml.com/pmdarima
MIT License
1.6k stars 234 forks source link

Better explanation of error #557

Open XJTLUmedia opened 1 year ago

XJTLUmedia commented 1 year ago

Is your feature request related to a problem? Please describe.

I'm confused for error raised in this code snippet

import pandas as pd
import numpy as np
import plotly.graph_objs as go
from statsmodels.tsa.arima.model import ARIMA

# Create a sample DataFrame with two years of GNI data for each continent
data = {'continent': ['Asia', 'Asia', 'Europe', 'Europe', 'Africa', 'Africa', 'North America', 'North America', 'South America', 'South America'],
        'year': [2018, 2019, 2018, 2019, 2018, 2019, 2018, 2019, 2018, 2019],
        'GNI': [1000, 1500, 2000, 2500, 1200, 1800, 1500, 2000, 1400, 1600]}
df = pd.DataFrame(data)

# Convert the 'year' column to datetime
df['year'] = pd.to_datetime(df['year'], format='%Y')

# Set the 'year' column as the index
df.set_index('year', inplace=True)

# Fit the ARIMA model for each continent
forecasts = {}
for continent in df['continent'].unique():
    continent_data = df[df['continent'] == continent]
    model = ARIMA(continent_data['GNI'], order=(1, 1, 1))  # Specify the ARIMA parameters manually
    model_fit = model.fit()
    forecast = model_fit.forecast(steps=1)
    forecasts[continent] = forecast[0]

# Print the forecasted GNI for each continent
for continent, forecast in forecasts.items():
    print(f"Forecasted GNI for {continent} in the next year:", forecast)

# Visualize the original and forecasted GNI for each continent using Plotly
fig = go.Figure()
for continent in df['continent'].unique():
    continent_data = df[df['continent'] == continent]
    fig.add_trace(go.Scatter(x=continent_data.index, y=continent_data['GNI'], mode='lines', name=continent))
    fig.add_trace(go.Scatter(x=[continent_data.index[-1], continent_data.index[-1] + pd.DateOffset(years=1)], y=[continent_data['GNI'].iloc[-1], forecasts[continent]], mode='lines', name=f"Forecasted GNI ({continent})"))
fig.update_layout(title='GNI Forecast by Continent', xaxis_title='Year', yaxis_title='GNI')
fig.show()

It shows error IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

Describe the solution you'd like

this error is confusing, this problem is caused by less than 2 samples in each dataframe

Describe alternatives you've considered

It should be ValueError: "You have to have at least 3 values to produce reasonable result" or sth

Additional Context

No response