NX-AI / xlstm

Official repository of the xLSTM.
https://www.nx-ai.com/
Apache License 2.0
1.42k stars 101 forks source link

How to run the code with a certain batch? #53

Open Cram3r95 opened 1 month ago

Cram3r95 commented 1 month ago

This code is working:

import torch
import pdb

from xlstm import (
    xLSTMBlockStack,
    xLSTMBlockStackConfig,
    mLSTMBlockConfig,
    mLSTMLayerConfig,
    sLSTMBlockConfig,
    sLSTMLayerConfig,
    FeedForwardConfig,
)

cfg = xLSTMBlockStackConfig(
    mlstm_block=mLSTMBlockConfig(
        mlstm=mLSTMLayerConfig(
            conv1d_kernel_size=4, qkv_proj_blocksize=4, num_heads=4
        )
    ),
    slstm_block=sLSTMBlockConfig(
        slstm=sLSTMLayerConfig(
            backend="cuda",
            num_heads=4,
            conv1d_kernel_size=4,
            bias_init="powerlaw_blockdependent",
        ),
        feedforward=FeedForwardConfig(proj_factor=1.3, act_fn="gelu"),
    ),
    context_length=256,
    num_blocks=7,
    embedding_dim=128,
    slstm_at=[1],

)

xlstm_stack = xLSTMBlockStack(cfg)

x = torch.randn(4, 256, 128).to("cuda")
xlstm_stack = xlstm_stack.to("cuda")
y = xlstm_stack(x)
pdb.set_trace()
y.shape == x.shape

But the network continously reports error if you try to add a batch size to the input, e.g.:

x = torch.randn(32, 4, 256, 128).to("cuda") # (where 32 is the batch size)

You get the following error:

File "/home/carlosgomezh/.local/lib/python3.10/site-packages/xlstm/blocks/mlstm/layer.py", line 102, in forward B, S, _ = x.shape ValueError: too many values to unpack (expected 3)

In your case it is a backbone processing a single tensor.

Is it possible to process something like this:

if __name__ == "__main__":
    # Define model hyperparameters
    input_dim = 6  
    hidden_dim = 128  
    output_dim = 1  
    num_layers = 2  
    context_length = 10  

    # Instantiate the model
    model = xLSTM(input_dim, hidden_dim, output_dim, num_layers, context_length).to('cuda')

    # Print the model structure
    print(model)

    # Example dummy input (batch_size=32, sequence_length=10, input_dim=6)
    dummy_input = torch.randn(32, context_length, input_dim).to('cuda')

    # Forward pass through the model
    output = model(dummy_input)
    print(output.shape)

Where you have 6 inputs, the h_dim of the network is 128 (for example), output dim is 1, and the context length is 10? Obviously 32 represents the batch size.

If I run that code, I get the following error:

File "/home/carlosgomezh/.local/lib/python3.10/site-packages/torch/nn/functional.py", line 2573, in layer_norm return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: Given normalized_shape=[128], expected input with shape [*, 128], but got input of size[32, 10, 6]

@kpoeppel @maximilianmbeck

kpoeppel commented 1 month ago

@Cram3r95 I think you have the wrong approach here, the size 4 in your example above is already considered the batch size, as the heads are only internal and not exposed.