Closed Yaziwel closed 2 months ago
Hi, thanks for the suggestion!
In the upcoming update on arxiv, we compare also against Vision-RWKV.
Regarding RetNet, I assume you mean RMT. As RMT is a hierarchical architecture (while ViL is currently only a isotropic architecture) a fair comparison is difficult. As ViL can exhibit linear complexity w.r.t. sequence length, it also has the possibility to be used in a hierarchical architecture (similar to how retention is used in RMT), but we haven't currently explored this direction.
A good job!
In addition to mamba, I would like to understand how xLSTM compares with other linear complexity models such as Vsion-RWKV and RetNet.