[Question]: Experimenting with xLSTM with Flair ?

Question

I just noticed a paper that came out this month on arxiv titled: xLSTM: Extended Long Short-Term Memory

It looks like the Algorithm that was the original success of Flair has gotten a significant upgrade, leveraging the latest techniques from modern LLMs.

The authors claim:

Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.

I was curious if anyone has experimented with this with Flair.

See - https://arxiv.org/abs/2405.04517

I appreciate that the best scoring NER model with Flair is currently based on XLMRobertaModel. However at 2.24 GB in size it is quite large and expensive to use for large scale inference. While the original FlairEmbedding based models are a mere fraction in size at 257 MB or 432 MB. It's an intriguing question if such a new approach could benefit Flair.

Please share your thoughts

flairNLP / flair

[Question]: Experimenting with xLSTM with Flair ? #3454

Question