flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.7k stars 2.08k forks source link

[Question]: Experimenting with xLSTM with Flair ? #3454

Open None-Such opened 1 month ago

None-Such commented 1 month ago

Question

I just noticed a paper that came out this month on arxiv titled: xLSTM: Extended Long Short-Term Memory

It looks like the Algorithm that was the original success of Flair has gotten a significant upgrade, leveraging the latest techniques from modern LLMs.

The authors claim:

Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.

I was curious if anyone has experimented with this with Flair.

See - https://arxiv.org/abs/2405.04517

I appreciate that the best scoring NER model with Flair is currently based on XLMRobertaModel. However at 2.24 GB in size it is quite large and expensive to use for large scale inference. While the original FlairEmbedding based models are a mere fraction in size at 257 MB or 432 MB. It's an intriguing question if such a new approach could benefit Flair.

Please share your thoughts