AI-Guru / helibrunna

A HuggingFace compatible Small Language Model trainer.
GNU Affero General Public License v3.0
73 stars 7 forks source link

Time series prediction #3

Open Cram3r95 opened 1 month ago

Cram3r95 commented 1 month ago

Hi @AI-Guru, I think this kind of network is really interesting for Time Series Prediction. I want to have predict a single scalar for a particular use case, given 6 inputs. According to the original code, it should be as following:

from xlstm import xLSTMBlockStack, xLSTMBlockStackConfig, mLSTMBlockConfig, mLSTMLayerConfig, sLSTMBlockConfig, sLSTMLayerConfig, FeedForwardConfig
import torch
import torch.nn as nn
from dacite import from_dict
from dacite import Config as DaciteConfig

# Define the configuration
xlstm_cfg = {
    "mlstm_block": {
        "mlstm": {
            "conv1d_kernel_size": 4,
            "qkv_proj_blocksize": 4,
            "num_heads": 4
        }
    },
    "slstm_block": {
        "slstm": {
            "backend": "cuda",
            "num_heads": 4,
            "conv1d_kernel_size": 4,
            "bias_init": "powerlaw_blockdependent"
        },
        "feedforward": {
            "proj_factor": 1.3,
            "act_fn": "gelu"
        }
    },
    "context_length": 10,
    "num_blocks": 7,
    "embedding_dim": 128,
    "slstm_at": [1]
}

# Convert the dictionary into the xLSTMBlockStackConfig dataclass
config = from_dict(data_class=xLSTMBlockStackConfig, data=xlstm_cfg, config=DaciteConfig(strict=True))

# Initialize xLSTMBlockStack
xlstm_stack = xLSTMBlockStack(config).to("cuda")

# Define a model that applies xLSTMBlockStack for your time series data
class nxFFBnet(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(nxFFBnet, self).__init__()
        self.xlstm_stack = xlstm_stack
        self.fc1 = nn.Linear(hidden_dim, hidden_dim // 2)
        self.fc2 = nn.Linear(hidden_dim // 2, output_dim)
        self.relu = nn.ReLU()

    def forward(self, x):
        # Reshape input to match expected dimensions for xLSTM
        x = x.permute(0, 2, 1)  # Shape: [batch_size, input_dim, sequence_length]

        # Apply xLSTM stack
        x = self.xlstm_stack(x)  # Expected shape: [batch_size, sequence_length, hidden_dim]

        # Use output from the last timestep
        x = x[:, -1, :]  # Shape: [batch_size, hidden_dim]

        # Pass through fully connected layers
        x = self.relu(self.fc1(x))  # Shape: [batch_size, hidden_dim // 2]
        return self.fc2(x)

# Define model parameters
input_dim = 6  # 6 input features (e.g., time-series variables)
hidden_dim = 128
output_dim = 1  # Output is a scalar (e.g., qSteer)

# Instantiate the model
model = nxFFBnet(input_dim=input_dim, hidden_dim=hidden_dim, output_dim=output_dim).to("cuda")

# Example dummy input (batch_size=32, sequence_length=10, input_dim=6)
dummy_input = torch.randn(32, 10, input_dim).to('cuda')

# Forward pass through the model
output = model(dummy_input)
print(output.shape)  # Expected output shape: [batch_size, output_dim]

Nevertheless, I continously receive the same error:

return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)

RuntimeError: Given normalized_shape=[128], expected input with shape [*, 128], but got input of size[32, 6, 10]

Is it not possible to use the model for time series prediction?

Obviously my input will be something like:

batch, context_len, inputs -> e.g. 32, 10, 6

AI-Guru commented 1 month ago

Howdy @Cram3r95! Great work! I can already feel the pull request!

It is hard to locate the line where the error happens. Please provide the stacktrace.

AI-Guru commented 1 month ago

@Cram3r95 ah no worries! I just saw it! You need to project your input feature space into embedding space.

Try this:

# Define a model that applies xLSTMBlockStack for your time series data
class nxFFBnet(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(nxFFBnet, self).__init__()
        self.xlstm_stack = xlstm_stack
        self.fc_outward = nn.Linear(input_dim, hidden_dim)
        self.fc1 = nn.Linear(hidden_dim, hidden_dim // 2)
        self.fc2 = nn.Linear(hidden_dim // 2, output_dim)
        self.relu = nn.ReLU()

    def forward(self, x):
        # Apply outward linear layer
        x = self.fc_outward(x)
        print("After fc_outward:", x.shape)

        # Reshape input to match expected dimensions for xLSTM
        #x = x.permute(0, 2, 1)  # Shape: [batch_size, input_dim, sequence_length]
        #print("After permute:", x.shape)

        # Apply xLSTM stack
        x = self.xlstm_stack(x)  # Expected shape: [batch_size, sequence_length, hidden_dim]
        print("After stack:", x.shape)

        # Use output from the last timestep
        x = x[:, -1, :]  # Shape: [batch_size, hidden_dim]
        print("After last timestep:", x.shape)

        # Pass through fully connected layers
        x = self.relu(self.fc1(x))  # Shape: [batch_size, hidden_dim // 2]
        print("After fc1:", x.shape)
        return self.fc2(x)

Not sure if permute is really necessary. Please check.

Cram3r95 commented 1 month ago

@AI-Guru yes, I solved it by myself. I still need to finish the experimental setup (i.e. LR scheduler etc etc). By the way, that code was not from your repo. I am using the xLSTM blocks from NX-AI, which I think is one of the most official implementations.

This is my xLSTM model:

# Define the configuration for xLSTM
xlstm_cfg = {
    "mlstm_block": {
        "mlstm": {
            "conv1d_kernel_size": 4,
            "qkv_proj_blocksize": 4,
            "num_heads": 4
        }
    },
    "slstm_block": {
        "slstm": {
            "backend": "cuda",
            "num_heads": 4,
            "conv1d_kernel_size": 4,
            "bias_init": "powerlaw_blockdependent"
        },
        "feedforward": {
            "proj_factor": 1.3,
            "act_fn": "gelu"
        }
    },
    "context_length": 10,
    "num_blocks": 7,
    "embedding_dim": 128,  # This is the hidden_dim expected by xLSTM
    "slstm_at": [1]
}

# Convert the dictionary into the xLSTMBlockStackConfig dataclass
config = from_dict(data_class=xLSTMBlockStackConfig, data=xlstm_cfg, config=DaciteConfig(strict=True))

#######################################

# Define the model with an embedding (or projection) layer
class nxFFBnet_xlstm(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(nxFFBnet_xlstm, self).__init__()

        # Add an embedding or linear projection layer to match the input_dim to hidden_dim
        self.embedding = nn.Linear(input_dim, hidden_dim)  # Project from input_dim (6) to hidden_dim (128)

        self.xlstm_stack = xLSTMBlockStack(config).to("cuda")  # Use the xLSTMBlockStack
        self.fc1 = nn.Linear(hidden_dim, hidden_dim // 2)
        self.fc2 = nn.Linear(hidden_dim // 2, output_dim)
        self.relu = nn.ReLU()

    def forward(self, x):
        # Project the input features to the required hidden_dim (128)
        x = self.embedding(x)  # Shape: [batch_size, sequence_length, hidden_dim]

        # Apply xLSTM stack
        x = self.xlstm_stack(x)  # Shape: [batch_size, sequence_length, hidden_dim]

        # Use the output from the last timestep
        x = x[:, -1, :]  # Shape: [batch_size, hidden_dim]

        # Pass through fully connected layers
        x = self.relu(self.fc1(x))  # Shape: [batch_size, hidden_dim // 2]
        return self.fc2(x)  # Shape: [batch_size, output_dim]

if __name__ == "__main__":
    # Define model parameters
    input_dim = 6  # 6 input features (e.g., time-series variables)
    hidden_dim = 128  # The hidden dimension expected by xLSTM
    output_dim = 1  # Output is a scalar (e.g., qSteer)

    # Instantiate the model
    model = nxFFBnet_xlstm(input_dim=input_dim, hidden_dim=hidden_dim, output_dim=output_dim).to("cuda")

    # Example dummy input (batch_size=32, sequence_length=10, input_dim=6)
    dummy_input = torch.randn(32, 10, input_dim).to('cuda')

    # Forward pass through the model
    output = model(dummy_input)
    print(output.shape)  # Expected output shape: [batch_size, output_dim]

On the other hand, this is my LSTM model:

# LSTM-based model class definition
class nxFFBnet_lstm(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, num_layers, context_length):
        super(nxFFBnet_lstm, self).__init__()

        # LSTM layer
        self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers, batch_first=True)

        # Fully connected layers
        self.fc1 = nn.Linear(hidden_dim, hidden_dim // 2)
        self.fc2 = nn.Linear(hidden_dim // 2, output_dim)

        # Activation function
        self.relu = nn.ReLU()

        # Dropout for regularization
        self.dropout = nn.Dropout(p=0.3)

    def forward(self, x):
        # LSTM layer
        lstm_out, _ = self.lstm(x)

        # Take the output of the last LSTM timestep
        lstm_out = lstm_out[:, -1, :]

        # Pass through fully connected layers with ReLU activations
        x = self.relu(self.fc1(lstm_out))
        x = self.dropout(x)
        output = self.fc2(x)

        return output

if __name__ == "__main__":
    # Define model hyperparameters
    input_dim = 6  # 6 input features: GPS Speed, nYaw, GPS Latitude, GPS Longitude, gLat_Motec, gLong_Motec
    hidden_dim = 128  # Number of LSTM hidden units
    output_dim = 1  # Output is a single scalar (qSteer)
    num_layers = 2  # Number of LSTM layers
    context_length = 10  # Lookback window size (timesteps)

    # Instantiate the model
    model = nxFFBnet_lstm(input_dim, hidden_dim, output_dim, num_layers, context_length).to('cuda')

    # Print the model structure
    print(model)

    # Example dummy input (batch_size=32, sequence_length=10, input_dim=6)
    dummy_input = torch.randn(32, context_length, input_dim).to('cuda')

    # Forward pass through the model
    output = model(dummy_input)

    # Print output shape (should be [32, 1], corresponding to batch_size and qSteer scalar output)
    print(output.shape)

Even though I get slightly better results with xLSTM, training and inference time is considerable slower (batch size = 1 (i.e. inference) takes 0.5 ms with LSTM and 15 ms with xLSTM). I guess this is because the original configuration is thought to deal with NLP-tasks and the network is probably overfitting.

Which configuration would you recommend for Time Series Prediction? Basically I have six inputs, normalized between 0 and 1, and I want to predict another variable also normalized between 0 and 1 (obviously during inference this is de-normalized). Which activation functions, embeddings, xLSTM configuration, LR scheduler etc would you recommend for this purpose?

You are the best!

AI-Guru commented 1 month ago

Cool! No worries, there is no xLSTM source code in this repo. xLSTM is imported from NXAI.

For time series, I would recommend xgboost for the first experiment. Then multi-layer perceptron. Then XLSTM. And then time series predictor with Llama backbone. Of course any stage of these experiments can be the final stage.

Cram3r95 commented 1 month ago

@AI-Guru I am quite experienced with time series prediction, my PhD was focused on Multi-Agent Motion Prediction for Autonomous Driving. I meant that in your opinion, which is the most suitable configuration for the xLSTM backbone.

AI-Guru commented 1 month ago

@Cram3r95 this is ongoing research: https://arxiv.org/abs/2407.10240

Cram3r95 commented 1 month ago

@AI-Guru so in your opinion, which could be a potential configuration? I think it is overfitting now with the number of blocks intended for NLP.