AlbertoAlmuinha / neuralprophet

NeuralProphet Algorithm in R for Tidymodels
Other
23 stars 3 forks source link

Error: Neural Prophet Autoregression Example #8

Closed mdancho84 closed 2 years ago

mdancho84 commented 3 years ago

Running into an error when attempting to follow the Yosemite Temperature Autoregression Example.

Something seems to happen when adding the n_lags and n_forecasts arguments that doesn't return results correctly when the predict() function is run.

library(neuralprophet)
library(modeltime)
library(tidymodels)
library(tidyverse)
library(timetk)
library(lubridate)

# Data

path <- "https://raw.githubusercontent.com/ourownstory/neural_prophet/master/example_data/yosemite_temps.csv"

yosemite_temps_tbl <- read_csv(path) %>%
    set_names(c('date','value'))
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   ds = col_datetime(format = ""),
#>   y = col_double()
#> )

data <- yosemite_temps_tbl

splits <- data %>% initial_time_split(prop = 0.9)

# Model Spec
model_spec <- neural_prophet(
    freq   = "5min", 
    n_lags = 6*12,
    n_forecasts = 3*12,
    seasonality_weekly = FALSE, 
    changepoint_range = 0.95, 
    changepoint_num = 30,
    batch_size = 64,
    epochs = 10,
    learn_rate = 1.0
) %>%
    set_engine("prophet")

# Fit Spec
model_fit <- model_spec %>%
    fit(value ~ date, data = training(splits))
model_fit
#> parsnip model object
#> 
#> Fit time:  6.8s 
#> Neural Prophet
#> Model: Neural Prophet
#> <neuralprophet.forecaster.NeuralProphet>

# Predictions

modeltime_table(
    model_fit
) %>%
    modeltime_forecast(
        new_data    = testing(splits)
        ,
        actual_data = data
    ) %>%
    plot_modeltime_forecast()
#> Error: Problem occurred during prediction. Error: Can't recycle `..1` (size 1908) to match `..2` (size 1873).
#> Warning: Unknown or uninitialised column: `.key`.
#> Error: Problem with `filter()` input `..1`.
#> ℹ Input `..1` is `.model_desc == "ACTUAL" | .key == "prediction"`.
#> x object '.key' not found

predict(model_fit, testing(splits))
#> # A tibble: 1,908 x 1
#>    .pred 
#>    <list>
#>  1 <NULL>
#>  2 <NULL>
#>  3 <NULL>
#>  4 <NULL>
#>  5 <NULL>
#>  6 <NULL>
#>  7 <NULL>
#>  8 <NULL>
#>  9 <NULL>
#> 10 <NULL>
#> # … with 1,898 more rows

Created on 2021-07-08 by the reprex package (v2.0.0)

AlbertoAlmuinha commented 2 years ago

Ok, let me take a look at it to see what might be going on.

AlbertoAlmuinha commented 2 years ago

Hi Matt, I have been taking a look at this and have modified it to work according to NeuralProphet, what does this mean? Well it is going to work by shortening the input data n_lags by the beginning and n_forecasts by the end because there the prediction will return a None (You can check it in Yosemite's own example in Colab, where 72 observations are shortened from the beginning and 36 from the end, hence the NULLs you saw).

Here is a working example with predict:

library(neuralprophet)
#> Loading required package: modeltime
library(modeltime)
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
library(tidyverse)
library(timetk)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

# Data

path <- "https://raw.githubusercontent.com/ourownstory/neural_prophet/master/example_data/yosemite_temps.csv"

yosemite_temps_tbl <- read_csv(path) %>%
    set_names(c('date','value'))
#> 
#> -- Column specification --------------------------------------------------------
#> cols(
#>   ds = col_datetime(format = ""),
#>   y = col_double()
#> )

data <- yosemite_temps_tbl

splits <- data %>% initial_time_split(prop = 0.9)

# Model Spec
model_spec <- neural_prophet(
    freq   = "5min", 
    n_lags = 6*12,
    n_forecasts = 3*12,
    seasonality_weekly = FALSE, 
    changepoint_range = 0.95, 
    changepoint_num = 30,
    batch_size = 64,
    epochs = 10,
    learn_rate = 1.0
) %>%
    set_engine("prophet")

# Fit Spec
model_fit <- model_spec %>%
    fit(value ~ date, data = training(splits))
model_fit
#> parsnip model object
#> 
#> Fit time:  27.7s 
#> Neural Prophet
#> Model: Neural Prophet
#> <neuralprophet.forecaster.NeuralProphet>

predict(model_fit, testing(splits))
#> # A tibble: 1,802 x 1
#>    .pred
#>    <dbl>
#>  1  18.7
#>  2  18.6
#>  3  18.6
#>  4  18.5
#>  5  18.4
#>  6  18.2
#>  7  18.1
#>  8  17.9
#>  9  17.7
#> 10  17.6
#> # ... with 1,792 more rows

Where is the problem then? That modeltime_forecast() is going to expect a vector of the same dimension as new_data, and that's where it fails now to find differences in expecting 1873 and returning 1802 (having removed the observations from the lags that return None).

So we have to think of a solution for this problem

AlbertoAlmuinha commented 2 years ago

Hi @mdancho84 ,

I have solved this by adding NAs simply to complete the length of the vector ahead with as many NAs as the length of n_lags. Below you can see the result (The error that appears in the legends of the images I understand that it is due to the NAs that contain the predictions at the beginning and that are eliminated):

img

img2

img3

beaverseven commented 9 months ago

Hello, I am currently following this, due to a similar error that I am having. I am trying to forecast data in a similar fashion. However, when I use n_lags and n_historic_predictions, I do not get predictions at all. I tried this solution that you proposed, but it still doesnt work.

Heres the snippet of the code:

`model = NeuralProphet(

growth="off", # Determine trend types: 'linear', 'discontinuous', 'off'

    # changepoints=None, # list of dates that may include change points (None -> automatic )
    n_changepoints=0,
    # changepoints_range=0,
    # trend_reg=0,
    # trend_reg_threshold=False,
    # # seasonality_reg=1,
    # # d_hidden = 0,
    n_lags=10,
    # # num_hidden_layers=0,     # Dimension of hidden layers of AR-Net
    # # ar_reg=None,  # Sparcity in the AR coefficients
    learning_rate=0.01,
    epochs=100,
    normalize="auto",  # Type of normalization ('minmax', 'standardize', 'soft', 'off')
    impute_missing=True,        
    yearly_seasonality=True,
    weekly_seasonality=False,
    daily_seasonality= False,
    seasonality_mode="multiplicative",
    loss_func="MSE",
)

# Fit the model to the training data
model.fit(data,freq="D")  
future = model.make_future_dataframe(data, periods=1000,n_historic_predictions=len(data))
forecast = model.predict(future)`

prediction