Open AI-Guru opened 3 months ago
Hi everyone,
I am most excited about xLSTM. Great and promising work!
Today, I am having trouble reproducing the model sizes from the paper. For example xLSTM[7:1] with 125M trainable parameters.
From the paper, I constructed the following config:
from omegaconf import OmegaConf from dacite import from_dict from xlstm.xlstm_lm_model import xLSTMLMModel, xLSTMLMModelConfig # Load the config. config_string = """ model: vocab_size: 50257 num_blocks: 24 embedding_dim: 384 mlstm_block: mlstm: num_heads: 4 slstm_block: slstm: num_heads: 4 slstm_at: [3, 20] context_length: 2048 """ config = OmegaConf.create(config_string) # Create the model. model_config = from_dict(xLSTMLMModelConfig, OmegaConf.to_container(config.model)) model = xLSTMLMModel(model_config) print(model_config) print(model) # Get the number of parameters. number_of_parameters = sum(p.numel() for p in model.parameters()) print(f"Number of parameters: {number_of_parameters:_}")
It yields:
Number of parameters: 60_575_792
This is roughly half of the expected parameters. What did I miss?
Cheers, Tristan
We have the same problem
It should be an embedding dimension of 768. Where did you find the 384?
Hi everyone,
I am most excited about xLSTM. Great and promising work!
Today, I am having trouble reproducing the model sizes from the paper. For example xLSTM[7:1] with 125M trainable parameters.
From the paper, I constructed the following config:
It yields:
This is roughly half of the expected parameters. What did I miss?
Cheers, Tristan