Closed DuHao55 closed 1 week ago
What table is that when you refer to table-4? In our newly released arxiv paper, table-4 is Architectural overview of the VMamba series.
in the appendix.
Sorry, in your latest paper, it is table 9 of the appendix. why the parameters and FLOPs I calculated with mmsegmentation (torch1.12) are so different from your results, I would like to know why.
I see. I suppose that the performances of swin-t you tested are also worse than the results in table-9 in our arxiv paper.
There are two reasons may contribute to this:
the window size of swin we used in this table is scaled, which is equals to the resolution divided by 32. (If you check the config files in original swin repo, you can also find that design.) But in mmpretrain (or mmdet and mmseg), raising the image-size will not directly leads to the window-size scaling.
fvcore do not support torch.nn.functional.scaled_dot_product_attention
, thus if this function is used when calculating flops, you need to replace it by the naive implementation of scaled-dot-product-attention.
The code in https://github.com/MzeroMiko/VMamba/blob/546c58911f5b159aea8bac36648bb712f1861ccb/analyze/tp.py#L58 did take those two factors into account, you can easily test the flops and throughput with it.
Thanks for the reply, I'll try again.
Why is there a huge difference between the FLOPs (Swin-T for table4) that I calculated with mmsegmentaiton get_flops.py(torch1.12)and your paper? What is causing the discrepancy? Can you share what tool was used to calculate the FLOPs for each model in table4?