Open Cram3r95 opened 1 month ago
Howdy @Cram3r95! Great work! I can already feel the pull request!
It is hard to locate the line where the error happens. Please provide the stacktrace.
@Cram3r95 ah no worries! I just saw it! You need to project your input feature space into embedding space.
Try this:
# Define a model that applies xLSTMBlockStack for your time series data
class nxFFBnet(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(nxFFBnet, self).__init__()
self.xlstm_stack = xlstm_stack
self.fc_outward = nn.Linear(input_dim, hidden_dim)
self.fc1 = nn.Linear(hidden_dim, hidden_dim // 2)
self.fc2 = nn.Linear(hidden_dim // 2, output_dim)
self.relu = nn.ReLU()
def forward(self, x):
# Apply outward linear layer
x = self.fc_outward(x)
print("After fc_outward:", x.shape)
# Reshape input to match expected dimensions for xLSTM
#x = x.permute(0, 2, 1) # Shape: [batch_size, input_dim, sequence_length]
#print("After permute:", x.shape)
# Apply xLSTM stack
x = self.xlstm_stack(x) # Expected shape: [batch_size, sequence_length, hidden_dim]
print("After stack:", x.shape)
# Use output from the last timestep
x = x[:, -1, :] # Shape: [batch_size, hidden_dim]
print("After last timestep:", x.shape)
# Pass through fully connected layers
x = self.relu(self.fc1(x)) # Shape: [batch_size, hidden_dim // 2]
print("After fc1:", x.shape)
return self.fc2(x)
Not sure if permute is really necessary. Please check.
@AI-Guru yes, I solved it by myself. I still need to finish the experimental setup (i.e. LR scheduler etc etc). By the way, that code was not from your repo. I am using the xLSTM blocks from NX-AI, which I think is one of the most official implementations.
This is my xLSTM model:
# Define the configuration for xLSTM
xlstm_cfg = {
"mlstm_block": {
"mlstm": {
"conv1d_kernel_size": 4,
"qkv_proj_blocksize": 4,
"num_heads": 4
}
},
"slstm_block": {
"slstm": {
"backend": "cuda",
"num_heads": 4,
"conv1d_kernel_size": 4,
"bias_init": "powerlaw_blockdependent"
},
"feedforward": {
"proj_factor": 1.3,
"act_fn": "gelu"
}
},
"context_length": 10,
"num_blocks": 7,
"embedding_dim": 128, # This is the hidden_dim expected by xLSTM
"slstm_at": [1]
}
# Convert the dictionary into the xLSTMBlockStackConfig dataclass
config = from_dict(data_class=xLSTMBlockStackConfig, data=xlstm_cfg, config=DaciteConfig(strict=True))
#######################################
# Define the model with an embedding (or projection) layer
class nxFFBnet_xlstm(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(nxFFBnet_xlstm, self).__init__()
# Add an embedding or linear projection layer to match the input_dim to hidden_dim
self.embedding = nn.Linear(input_dim, hidden_dim) # Project from input_dim (6) to hidden_dim (128)
self.xlstm_stack = xLSTMBlockStack(config).to("cuda") # Use the xLSTMBlockStack
self.fc1 = nn.Linear(hidden_dim, hidden_dim // 2)
self.fc2 = nn.Linear(hidden_dim // 2, output_dim)
self.relu = nn.ReLU()
def forward(self, x):
# Project the input features to the required hidden_dim (128)
x = self.embedding(x) # Shape: [batch_size, sequence_length, hidden_dim]
# Apply xLSTM stack
x = self.xlstm_stack(x) # Shape: [batch_size, sequence_length, hidden_dim]
# Use the output from the last timestep
x = x[:, -1, :] # Shape: [batch_size, hidden_dim]
# Pass through fully connected layers
x = self.relu(self.fc1(x)) # Shape: [batch_size, hidden_dim // 2]
return self.fc2(x) # Shape: [batch_size, output_dim]
if __name__ == "__main__":
# Define model parameters
input_dim = 6 # 6 input features (e.g., time-series variables)
hidden_dim = 128 # The hidden dimension expected by xLSTM
output_dim = 1 # Output is a scalar (e.g., qSteer)
# Instantiate the model
model = nxFFBnet_xlstm(input_dim=input_dim, hidden_dim=hidden_dim, output_dim=output_dim).to("cuda")
# Example dummy input (batch_size=32, sequence_length=10, input_dim=6)
dummy_input = torch.randn(32, 10, input_dim).to('cuda')
# Forward pass through the model
output = model(dummy_input)
print(output.shape) # Expected output shape: [batch_size, output_dim]
On the other hand, this is my LSTM model:
# LSTM-based model class definition
class nxFFBnet_lstm(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim, num_layers, context_length):
super(nxFFBnet_lstm, self).__init__()
# LSTM layer
self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers, batch_first=True)
# Fully connected layers
self.fc1 = nn.Linear(hidden_dim, hidden_dim // 2)
self.fc2 = nn.Linear(hidden_dim // 2, output_dim)
# Activation function
self.relu = nn.ReLU()
# Dropout for regularization
self.dropout = nn.Dropout(p=0.3)
def forward(self, x):
# LSTM layer
lstm_out, _ = self.lstm(x)
# Take the output of the last LSTM timestep
lstm_out = lstm_out[:, -1, :]
# Pass through fully connected layers with ReLU activations
x = self.relu(self.fc1(lstm_out))
x = self.dropout(x)
output = self.fc2(x)
return output
if __name__ == "__main__":
# Define model hyperparameters
input_dim = 6 # 6 input features: GPS Speed, nYaw, GPS Latitude, GPS Longitude, gLat_Motec, gLong_Motec
hidden_dim = 128 # Number of LSTM hidden units
output_dim = 1 # Output is a single scalar (qSteer)
num_layers = 2 # Number of LSTM layers
context_length = 10 # Lookback window size (timesteps)
# Instantiate the model
model = nxFFBnet_lstm(input_dim, hidden_dim, output_dim, num_layers, context_length).to('cuda')
# Print the model structure
print(model)
# Example dummy input (batch_size=32, sequence_length=10, input_dim=6)
dummy_input = torch.randn(32, context_length, input_dim).to('cuda')
# Forward pass through the model
output = model(dummy_input)
# Print output shape (should be [32, 1], corresponding to batch_size and qSteer scalar output)
print(output.shape)
Even though I get slightly better results with xLSTM, training and inference time is considerable slower (batch size = 1 (i.e. inference) takes 0.5 ms with LSTM and 15 ms with xLSTM). I guess this is because the original configuration is thought to deal with NLP-tasks and the network is probably overfitting.
Which configuration would you recommend for Time Series Prediction? Basically I have six inputs, normalized between 0 and 1, and I want to predict another variable also normalized between 0 and 1 (obviously during inference this is de-normalized). Which activation functions, embeddings, xLSTM configuration, LR scheduler etc would you recommend for this purpose?
You are the best!
Cool! No worries, there is no xLSTM source code in this repo. xLSTM is imported from NXAI.
For time series, I would recommend xgboost for the first experiment. Then multi-layer perceptron. Then XLSTM. And then time series predictor with Llama backbone. Of course any stage of these experiments can be the final stage.
@AI-Guru I am quite experienced with time series prediction, my PhD was focused on Multi-Agent Motion Prediction for Autonomous Driving. I meant that in your opinion, which is the most suitable configuration for the xLSTM backbone.
@Cram3r95 this is ongoing research: https://arxiv.org/abs/2407.10240
@AI-Guru so in your opinion, which could be a potential configuration? I think it is overfitting now with the number of blocks intended for NLP.
Hi @AI-Guru, I think this kind of network is really interesting for Time Series Prediction. I want to have predict a single scalar for a particular use case, given 6 inputs. According to the original code, it should be as following:
Nevertheless, I continously receive the same error:
RuntimeError: Given normalized_shape=[128], expected input with shape [*, 128], but got input of size[32, 6, 10]
Is it not possible to use the model for time series prediction?
Obviously my input will be something like:
batch, context_len, inputs -> e.g. 32, 10, 6