NX-AI / xlstm

Official repository of the xLSTM.
https://www.nx-ai.com/
Apache License 2.0
1.42k stars 101 forks source link

Causal Language Modeling - GPT-like training and next-token prediction. #46

Closed AI-Guru closed 1 month ago

AI-Guru commented 3 months ago

Hi!

First and foremost, thanks a lot for making xLSTM open-source! This is fantastic!

I want to use xLSTM for next-token prediction. Especially for symbolic music datasets.

After reading the paper, I think I am ready to go. I want to train an xLSTM similar to how GPTs are being trained. GPT training uses a full sequence as input and the same sequence shifted one to the left as output. The training is highly parallel because of the causal mask in the multi-head attention.

Now I wonder, would I train xLSTM on similar input and output pairs? Token sequence in and shifted token sequence out?

I got the impression from the paper that this is possible using only mLSTM blocks. With the introduction of an sLSTM block this parallel training would not work.

Is that so?

kpoeppel commented 1 month ago

It is still possible to train this way and typically also more efficient, since the time loop can stay within C++/CUDA.