andrewgcodes / xlstm

my attempts at implementing various bits of Sepp Hochreiter's new xLSTM architecture
MIT License
111 stars 8 forks source link

How do I use it like LSTMLayer in torch #2

Open DDCY220 opened 1 month ago

DDCY220 commented 1 month ago

I want to use this code like Torch.nn.LSTM, but I'm having problems with the data applied to (batch, len, emb), and I sincerely request that you continue to present the code in batch

sidd462 commented 1 month ago

If you prefer to have the batch dimension first, you can set the batch_first=True parameter when initializing the LSTM. This changes the expected input shape to (batch, seq_len, input_size).

import torch
import torch.nn as nn
from torch.nn.utils.rnn import pad_sequence, pack_padded_sequence, pad_packed_sequence

# Example parameters
input_size = 10  # Number of features per timestep
hidden_size = 20 # Number of features in hidden state
num_layers = 2   # Number of recurrent layers

# Creating the LSTM
lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)

# Example data (list of tensors of varying lengths)
data = [torch.randn(5, input_size), torch.randn(3, input_size), torch.randn(6, input_size)]

# Padding sequences and creating a batch
padded_data = pad_sequence(data, batch_first=True)
lengths = torch.tensor([len(x) for x in data])

# Packing the padded sequences
packed_input = pack_padded_sequence(padded_data, lengths.cpu(), batch_first=True, enforce_sorted=False)

# Feeding the packed batch to the LSTM
packed_output, (hidden, cell) = lstm(packed_input)

# Unpacking the output
output, output_lengths = pad_packed_sequence(packed_output, batch_first=True)

# Processing outputs...
print(output.shape)  # [batch, seq_len, num_directions * hidden_size]