Failed to reproduce paper's model sizes.

AI-Guru commented 3 months ago

Hi everyone,

I am most excited about xLSTM. Great and promising work!

Today, I am having trouble reproducing the model sizes from the paper. For example xLSTM[7:1] with 125M trainable parameters.

From the paper, I constructed the following config:

from omegaconf import OmegaConf
from dacite import from_dict
from xlstm.xlstm_lm_model import xLSTMLMModel, xLSTMLMModelConfig

# Load the config.
config_string = """ 
model:
  vocab_size: 50257
  num_blocks: 24
  embedding_dim: 384
  mlstm_block:
    mlstm:
      num_heads: 4
  slstm_block:
    slstm:
      num_heads: 4
  slstm_at: [3, 20]
  context_length: 2048
"""
config = OmegaConf.create(config_string)

# Create the model.
model_config = from_dict(xLSTMLMModelConfig, OmegaConf.to_container(config.model))
model = xLSTMLMModel(model_config)
print(model_config)
print(model)

# Get the number of parameters.
number_of_parameters = sum(p.numel() for p in model.parameters())
print(f"Number of parameters: {number_of_parameters:_}")

It yields:

Number of parameters: 60_575_792

This is roughly half of the expected parameters. What did I miss?

Cheers, Tristan

PRamoneda commented 1 month ago

We have the same problem

kpoeppel commented 1 month ago

It should be an embedding dimension of 768. Where did you find the 384?

NX-AI / xlstm

Failed to reproduce paper's model sizes. #48