How to compute adamixer's FLOPs?

fushh commented 1 year ago

I use the tools/analysis_tools/get_flops.py to calculate AdaMixer's FLOPs and get the following results which is not aligned with the number reported in the paper. And it seems that the SRShadowForFlops doesn't work.

  (5): AdaMixerDecoderStage(
    18.116 M, 13.462% Params, 1.812 GFLOPs, 1.762% FLOPs, 
    (loss_cls): FocalLoss(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, )
    (loss_bbox): L1Loss(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, )
    (fc_cls): Linear(0.021 M, 0.015% Params, 0.002 GFLOPs, 0.002% FLOPs, in_features=256, out_features=80, bias=True)
    (fc_reg): Linear(0.001 M, 0.001% Params, 0.0 GFLOPs, 0.000% FLOPs, in_features=256, out_features=4, bias=True)
    (loss_iou): GIoULoss(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, )
    (attention): MultiheadAttention(
      0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, 
      (attn): MultiheadAttention(
        0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, 
        (out_proj): NonDynamicallyQuantizableLinear(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, in_features=256, out_features=256, bias=True)
      )
      (proj_drop): Dropout(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, p=0.0, inplace=False)
      (dropout_layer): Dropout(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, p=0.0, inplace=False)
    )
    (attention_norm): LayerNorm(0.001 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, (256,), eps=1e-05, elementwise_affine=True)
    (instance_interactive_conv_dropout): Dropout(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, p=0.0, inplace=False)
    (instance_interactive_conv_norm): LayerNorm(0.001 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, (256,), eps=1e-05, elementwise_affine=True)
    (ffn): FFN(
      1.051 M, 0.781% Params, 0.105 GFLOPs, 0.102% FLOPs, 
      (activate): ReLU(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, inplace=True)
      (layers): Sequential(
        1.051 M, 0.781% Params, 0.105 GFLOPs, 0.102% FLOPs, 
        (0): Sequential(
          0.526 M, 0.391% Params, 0.053 GFLOPs, 0.051% FLOPs, 
          (0): Linear(0.526 M, 0.391% Params, 0.052 GFLOPs, 0.051% FLOPs, in_features=256, out_features=2048, bias=True)
          (1): ReLU(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, inplace=True)
          (2): Dropout(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, p=0.0, inplace=False)
        )
        (1): Linear(0.525 M, 0.390% Params, 0.052 GFLOPs, 0.051% FLOPs, in_features=2048, out_features=256, bias=True)
        (2): Dropout(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, p=0.0, inplace=False)
      )
      (dropout_layer): Identity(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, )
    )
    (ffn_norm): LayerNorm(0.001 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, (256,), eps=1e-05, elementwise_affine=True)
    (cls_fcs): ModuleList(
      0.066 M, 0.049% Params, 0.007 GFLOPs, 0.006% FLOPs, 
      (0): Linear(0.066 M, 0.049% Params, 0.007 GFLOPs, 0.006% FLOPs, in_features=256, out_features=256, bias=True)
      (1): LayerNorm(0.001 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, (256,), eps=1e-05, elementwise_affine=True)
      (2): ReLU(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, inplace=True)
    )
    (reg_fcs): ModuleList(
      0.066 M, 0.049% Params, 0.007 GFLOPs, 0.006% FLOPs, 
      (0): Linear(0.066 M, 0.049% Params, 0.007 GFLOPs, 0.006% FLOPs, in_features=256, out_features=256, bias=True)
      (1): LayerNorm(0.001 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, (256,), eps=1e-05, elementwise_affine=True)
      (2): ReLU(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, inplace=True)
    )
    (sampling_n_mixing): AdaptiveSamplingMixing(
      16.909 M, 12.566% Params, 1.692 GFLOPs, 1.644% FLOPs, 
      (sampling_offset_generator): Sequential(
        0.099 M, 0.073% Params, 0.01 GFLOPs, 0.010% FLOPs, 
        (0): Linear(0.099 M, 0.073% Params, 0.01 GFLOPs, 0.010% FLOPs, in_features=256, out_features=384, bias=True)
      )
      (norm): LayerNorm(0.001 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, (256,), eps=1e-05, elementwise_affine=True)
      (adaptive_mixing): AdaptiveMixing(
        16.81 M, 12.492% Params, 1.682 GFLOPs, 1.635% FLOPs, 
        (parameter_generator): Sequential(
          8.421 M, 6.258% Params, 0.839 GFLOPs, 0.815% FLOPs, 
          (0): Linear(8.421 M, 6.258% Params, 0.839 GFLOPs, 0.815% FLOPs, in_features=256, out_features=32768, bias=True)
        )
        (out_proj): Linear(8.389 M, 6.234% Params, 0.839 GFLOPs, 0.815% FLOPs, in_features=32768, out_features=256, bias=True)
        (act): ReLU(0.0 M, 0.000% Params, 0.004 GFLOPs, 0.004% FLOPs, inplace=True)
        (shadow): SRShadowForFlops(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, )
      )
    )
  )
  init_cfg=[{'type': 'Normal', 'std': 0.01, 'override': {'name': 'fc_cls'}}, {'type': 'Normal', 'std': 0.001, 'override': {'name': 'fc_reg'}}]
)

/============================== Input shape: (3, 1280, 800) Flops: 102.88 GFLOPs Params: 134.57 M /==============================

Could you tell me how to get the exact FLOPs number?

sebgao commented 1 year ago

Till now, we don't have a plan to release our customized mmcv as it is not systematic and well structured. You can compute extra FLOPs according to SRShadowForFlops manually or resort to fvcore (a simple but comprehensive FLOP statistics tool, better than mmcv).

fushh commented 1 year ago

When using officially downloaded mmcv-full=1.3.3, it raises many import errors like

ImportError: cannot import name 'to_2tuple' from 'mmcv.utils' (/home/proto/anaconda3/envs/mmcv133/lib/python3.7/site-packages/mmcv/utils/init.py) ImportError: cannot import name 'trunc_normal_init' from 'mmcv.cnn' (/home/proto/anaconda3/envs/mmcv133/lib/python3.7/site-packages/mmcv/cnn/init.py)

Does these NotImplementedErrors come from your customized mmcv?

sebgao commented 1 year ago

I believe this is a bug of mmcv. You can patch your code following this.

MCG-NJU / AdaMixer

How to compute adamixer's FLOPs? #19