FLOPs - Githubissues

DuHao55 commented 1 week ago

Why is there a huge difference between the FLOPs (Swin-T for table4) that I calculated with mmsegmentaiton get_flops.py(torch1.12）and your paper? What is causing the discrepancy? Can you share what tool was used to calculate the FLOPs for each model in table4?

MzeroMiko commented 1 week ago

What table is that when you refer to table-4? In our newly released arxiv paper, table-4 is Architectural overview of the VMamba series. in the appendix.

DuHao55 commented 1 week ago

Sorry, in your latest paper, it is table 9 of the appendix. why the parameters and FLOPs I calculated with mmsegmentation (torch1.12) are so different from your results, I would like to know why.

MzeroMiko commented 1 week ago

I see. I suppose that the performances of swin-t you tested are also worse than the results in table-9 in our arxiv paper.

There are two reasons may contribute to this:

the window size of swin we used in this table is scaled, which is equals to the resolution divided by 32. (If you check the config files in original swin repo, you can also find that design.) But in mmpretrain (or mmdet and mmseg), raising the image-size will not directly leads to the window-size scaling.
fvcore do not support torch.nn.functional.scaled_dot_product_attention, thus if this function is used when calculating flops, you need to replace it by the naive implementation of scaled-dot-product-attention.

The code in https://github.com/MzeroMiko/VMamba/blob/546c58911f5b159aea8bac36648bb712f1861ccb/analyze/tp.py#L58 did take those two factors into account, you can easily test the flops and throughput with it.

MzeroMiko commented 1 week ago

more specifically is

https://github.com/MzeroMiko/VMamba/blob/546c58911f5b159aea8bac36648bb712f1861ccb/analyze/utils.py#L554

and

https://github.com/MzeroMiko/VMamba/blob/546c58911f5b159aea8bac36648bb712f1861ccb/analyze/utils.py#L1505

DuHao55 commented 1 week ago

Thanks for the reply, I'll try again.

MzeroMiko / VMamba

FLOPs #235