about function predict()

tianliu88 commented 5 months ago

Excellent package, but I encountered a problem while using it, and I would like to ask you for advice.When using the predict function, the predicted value is 1 more than expected.I can't figure out which values I should use as my predictions. For example, the monthly incidence numbers of hand, foot and mouth disease in the United States from 2005 to 2022 were used to train a model to predict the number of incidences in 12 months in 2023, and the final number of predicted values was 13. Which of the 13 values will be the number of hand, foot and mouth disease cases in the United States in 2023? Looking forward to your reply and thank you again for developing such an excellent package!

JaiPizGon commented 5 months ago

Hi!

Thanks for using the package, is still in beta and might produce problems like that.

Could you provide the code with a reproducible example?

tianliu88 commented 5 months ago

thank you for your reply~ In the example of the TSLSTMplus package, we try to build an LSTM model with a lag of 2 orders that contains the regressor x to predict the y variable. When making predictions, we use the code: future_values <- predict(TSLSTM, horizon=50, xreg = x, ts = y, xreg.new = x.ts) to predict the next 50 values (that is, predict the 101st to y 150th value). But future_values is a vector of length 51 and I don't know how to determine 50 of the 51 values as predicted values of y. the example code：

library(TSLSTMplus) y<-rnorm(100,mean=100,sd=50) x1<-rnorm(150,mean=50,sd=50) x2<-rnorm(150, mean=50, sd=25) x<-cbind(x1,x2) x.tr <- x[1:100,] x.ts <- x[101:150,] TSLSTM<-ts.lstm(ts=y, xreg = x.tr, tsLag=2, xregLag = 0, LSTMUnits=5, ScaleInput = 'scale', ScaleOutput = 'scale', Epochs=2) current_values <- predict(TSLSTM, xreg = x.tr, ts = y) future_values <- predict(TSLSTM, horizon=50, xreg = x, ts = y, xreg.new = x.ts)

祝好！刘天主管医师荆州市疾病预防控制中心 Jingzhou Center for Disease Control and Prevention 长江大学公共卫生研究中心 Public Health Research Center, Yangtze University 地址：湖北省荆州市沙市区清河路6号

发件人: JaiPizGon 发送时间: 2024年1月11日 20:24 收件人: JaiPizGon/TSLSTMplus 抄送: tianliu88; Author 主题: Re: [JaiPizGon/TSLSTMplus] about function predict() (Issue #1)

Hi! Thanks for using the package, is still in beta and might produce problems like that. Could you provide the code with a reproducible example? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

JaiPizGon commented 5 months ago

Hello again,

You were right, there was an extra sample being predicted. Version 1.0.2 is on its way to CRAN and that bug seems fixed.

If you want, download the new version from Github using devtools::install_github("JaiPizGon/TSLSTMplus") and check if the problem is solved.

Thanks for using the package!

tianliu88 commented 5 months ago

After testing, it seems that the TSLSTMplus package version 1.0.2 can successfully predict the examples mentioned in the previous email. However, I encountered errors when running my own data. Due to my limited proficiency, I am unable to identify the source of the error. I would appreciate it if you could help me by reviewing the attached data and code.

Additionally, I noticed that when using the function, setting the parameters ScaleInput and ScaleOutput to "minmax" also results in an error. I hope you can assist with this issue as well. You can use the data I sent you for testing. The data, named 'a,' is a data frame with 7 columns. I intend to establish 7 LSTM models for 7 sets of data and predict the next 12 data points for each.

Once again, thank you for your attentive guidance, and I look forward to your response!

祝好！刘天主管医师荆州市疾病预防控制中心 Jingzhou Center for Disease Control and Prevention 长江大学公共卫生研究中心 Public Health Research Center, Yangtze University 地址：湖北省荆州市沙市区清河路6号

发件人: JaiPizGon 发送时间: 2024年1月12日 6:39 收件人: JaiPizGon/TSLSTMplus 抄送: tianliu88; Author 主题: Re: [JaiPizGon/TSLSTMplus] about function predict() (Issue #1)

Hello again, You were right, there was an extra sample being predicted. Version 1.0.2 is on its way to CRAN and that bug seems fixed. If you want, download the new version from Github using devtools::install_github("JaiPizGon/TSLSTMplus") and check if the problem is solved. Thanks for using the package! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

JaiPizGon commented 5 months ago

Hello again,

You were right, there was a bug regarding the minmax scaler because I forgot to store the minimum and range values for the variables to reverse scale the variables for the predictions. I have uploaded to github version 1.0.3 of the package that have that bug corrected.

Regarding the use case you stated with data, named 'a,' is a data frame with 7 columns. I intend to establish 7 LSTM models for 7 sets of data and predict the next 12 data points for each.. I could not find any data attached to your message, so I created the following synthetic dataset resembling the description you provided.

# Load necessary libraries
library(keras)
library(dplyr)
library(TSLSTMplus)
library(dplyr)
library(tibble)

# Function to create a time-series relationship for Y
create_Y <- function(X) {
    # Example operation: Y is the sum of all X columns at the current time step
    # minus the sum of all X columns at the previous time step.
    lagged_X <- lag(X)
    Y <- rowSums(X, na.rm = TRUE) - rowSums(lagged_X, na.rm = TRUE)
    return(Y)
}

# Generate a large data frame
set.seed(123) # For reproducibility
n <- 10003 # Total number of observations
a <- as.data.frame(matrix(runif(n * 6), nrow = n, ncol = 6))
colnames(a) <- paste0("X", 1:6)

# Creating the Y column with a time-series relationship
a$Y <- create_Y(a)

# Splitting the data frame into 7 datasets
split_size <- nrow(a) / 7
datasets <- split(a, rep(1:7, each = split_size))

# Prepare each dataset for training and testing
prepare_data <- function(dataset) {
    n <- nrow(dataset)
    fdata <- ts(dataset)
    split_index <- (n-11)
    fdata_tr <- subset(fdata, end = split_index)
    fdata_ts <- subset(fdata, start = split_index + 1)
    x_ts <- fdata_ts[,1:6]
    y_ts <- fdata_ts[,7]
    x_tr <- fdata_tr[,1:6]
    y_tr <- fdata_tr[,7]
    list(x_tr = x_tr, y_tr = y_tr, x_ts = x_ts, y_ts = y_ts)
}

prepared_datasets <- lapply(datasets, prepare_data)

# Split index for training and test
split_index <- round(nrow(prepared_datasets[[1]]) * 0.9) # 90% for training, 10% for testing
TSLSTM_models <- list() # To store the models
pred_tr_list <- list() # To store training predictions
pred_ts_list <- list() # To store test predictions

for (i in 1:length(prepared_datasets)) {
    # Split between output and inputs
    x_tr <- prepared_datasets[[i]]$x_tr
    y_tr <- prepared_datasets[[i]]$y_tr
    x_ts <- prepared_datasets[[i]]$x_ts
    y_ts <- prepared_datasets[[i]]$y_ts

    # Train the model for data.frame i
    TSLSTM <- ts.lstm(ts=y_tr,
                      xreg = x_tr,
                      tsLag = 5,
                      xregLag = 5, 
                      LSTMUnits = c(64, 32),
                      Epochs = 20,
                      ScaleOutput = 'scale', # NULL, 'minmax'
                      ScaleInput = 'scale', # NULL, 'minmax'
                      BatchSize = 64,
                      LSTMActivationFn = 'tanh',
                      LSTMRecurrentActivationFn = 'sigmoid',
                      DenseActivationFn = 'relu',
                      ValidationSplit = 0.2,
                      verbose=1,
                      RandomState=150,
                      LagsAsSequences = FALSE,
                      Stateful = FALSE
    )

    # Store the model
    TSLSTM_models[[i]] <- TSLSTM

    # Create predictions for training
    pred_tr <- predict(TSLSTM, xreg = x_tr, ts = y_tr)

    # Store training predictions
    pred_tr_list[[i]] <- pred_tr

    # Create predictions for test
    pred_ts <- predict(TSLSTM, horizon = length(y_ts), xreg = x_tr, ts = y_tr, 
                       xreg.new = x_ts, BatchSize = 1)

    # Store test predictions
    pred_ts_list[[i]] <- pred_ts
}

library(fpp2)
library(plotly)

# Initialize an empty list to store plots
plots_list <- list()

# Loop through each dataset and create plots
for (i in 1:length(prepared_datasets)) {
    y_tr <- prepared_datasets[[i]]$y_tr
    y_ts <- prepared_datasets[[i]]$y_ts
    pred_tr <- pred_tr_list[[i]]
    pred_ts <- pred_ts_list[[i]]

    # Create a plot for each dataset
    p <- autoplot(y_tr, series = "Train") +
        autolayer(pred_tr, series = "Predicted Train") +
        autolayer(y_ts, series = "Test") +
        autolayer(pred_ts, series = "Predicted Test (in batches)") +
        labs(x = "Time", y = "Values") +
        theme_minimal()

    # Store the plot
    plots_list[[i]] <- p
}

# Convert to interactive plotly plot
plotly_combined_plot <- lapply(plots_list, ggplotly)
subplot(plotly_combined_plot, nrows=4)

In the example, I create a big dataset and then I separate it in 7 different subsets. From this subsets, I left the last 12 samples as test and the rest as training. Now, for each training and test pair, I train a LSTM model and create the predictions. Lastly, I plot the real data and the model's predictions for each dataset.

Please, try it out in your data and contact again with any issue you encounter.

JaiPizGon commented 5 months ago

Hello again,

just checking in to see if the solution worked.

If I receive no answer during the week I would close the issue as completed.

Thanks for using the package!

JaiPizGon / TSLSTMplus

about function predict() #1