Open MDD-0928 opened 3 months ago
The hierarchical structure is slower than vit for sure, but we do have comparable throughput with swin in torch2. We are still working on the way trying to make the model faster.
We have no plan pretraining on imagenet21k now due to the limited resources, we may do it in the future.
To be honest, hard to say. Raising the dimension nearly equals raising the batchsize in this implementation of selective scan, and it is not related to the seqlen. Also, change the embed dim may raise the performance though, it seems tricky.
Dear authors: