ChenhongyiYang / PlainMamba

[BMVC 2024] PlainMamba: Improving Non-hierarchical Mamba in Visual Recognition
Apache License 2.0
68 stars 7 forks source link

Hi, would you like to report the training speed of your method? #3

Closed FanqingM closed 6 months ago

FanqingM commented 6 months ago

I also have tried to use "zigzag" scan, while it is much slower than just use flatten to transform the image to 1D seq. I wonder if you can report the training speed of your method?

Best regards!

ChenhongyiYang commented 6 months ago

Hi, the training speed is indeed slow. For example, it required around 4 days to train an L2 model using 8 A100 GPUs. However, we believe such a slow speed is caused by our usage of the default selective scan function provided by the mamba-ssm package. (We used the default implementation because we wanted others to reproduce our result with a minimum effort :) )