NX-AI / vision-lstm

xLSTM as Generic Vision Backbone
GNU Affero General Public License v3.0
405 stars 28 forks source link

Comparison with Vision-RWKV #1

Closed Yaziwel closed 2 months ago

Yaziwel commented 3 months ago

A good job!

In addition to mamba, I would like to understand how xLSTM compares with other linear complexity models such as Vsion-RWKV and RetNet.

BenediktAlkin commented 2 months ago

Hi, thanks for the suggestion!

In the upcoming update on arxiv, we compare also against Vision-RWKV.

Regarding RetNet, I assume you mean RMT. As RMT is a hierarchical architecture (while ViL is currently only a isotropic architecture) a fair comparison is difficult. As ViL can exhibit linear complexity w.r.t. sequence length, it also has the possibility to be used in a hierarchical architecture (similar to how retention is used in RMT), but we haven't currently explored this direction.