I am testing the model on multiple configs. While using step() method to get both the output and the states, I observed that models with sLSTM layer does not have method step. Instead, to get the state, we must use the argument return_last_state=True. This causes the xLSTM Language model cannot get state. This is my code used:
from omegaconf import OmegaConf
import torch
from dacite import from_dict
from dacite import Config as DaciteConfig
from xlstm import xLSTMLMModel, xLSTMLMModelConfig
xlstm_cfg = """
vocab_size: 50304
mlstm_block:
mlstm:
conv1d_kernel_size: 4
qkv_proj_blocksize: 4
num_heads: 4
slstm_block:
slstm:
backend: vanilla
num_heads: 4
conv1d_kernel_size: 4
bias_init: powerlaw_blockdependent
feedforward:
proj_factor: 1.3
act_fn: gelu
context_length: 256
num_blocks: 7
embedding_dim: 128
slstm_at: [1]
"""
cfg = OmegaConf.create(xlstm_cfg)
cfg = from_dict(data_class=xLSTMLMModelConfig, data=OmegaConf.to_container(cfg), config=DaciteConfig(strict=True))
model = xLSTMLMModel(cfg)
x = torch.randint(0, 50304, size=(4, 256)).to("cpu")
model = model.to("cpu")
model.step(torch.Tensor([1]).unsqueeze(dim=0).long())
This is the error:
AttributeError: 'sLSTMLayer' object has no attribute 'step'
I wonder can you change the method in the sLSTM for analogousness. Thank you so much.
Hi, thank so much for your work.
I am testing the model on multiple configs. While using
step()
method to get both the output and the states, I observed that models with sLSTM layer does not have methodstep
. Instead, to get the state, we must use the argumentreturn_last_state=True
. This causes the xLSTM Language model cannot get state. This is my code used:This is the error:
AttributeError: 'sLSTMLayer' object has no attribute 'step'
I wonder can you change the method in the sLSTM for analogousness. Thank you so much.