RangiLyu / nanodet

NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥
Apache License 2.0
5.71k stars 1.04k forks source link

How to apply MobileViTV2 using Timm? #537

Closed nijatmursali closed 10 months ago

nijatmursali commented 11 months ago

I'm trying to apply MobileViTV2 mentoned in this document using Timm. I found this model and modified the config file, and when training starts all is good with it, bu I get

RuntimeError: The size of tensor a (4165) must match the size of tensor b (1045) at non-singleton dimension 1

I found the issue where in /nanodet/model/head/nanodet_plus_head.py

        dis_preds = self.distribution_project(reg_preds) * center_priors[..., 2, None]

where center_priors has torch.Size([32, 1045, 4]) and reg_preds has torch.Size([32, 4165, 32])

and distribution_project is just a Integral self.distribution_project = Integral(self.reg_max)

How can I solve this issue?

My config file looks like:

model:
  weight_averager:
    name: ExpMovingAverager
    decay: 0.9998
  arch:
    name: NanoDetPlus
    detach_epoch: 10
    backbone:
      name: TIMMWrapper
      model_name: mobilevitv2_100
      features_only: True
      pretrained: True
      out_indices: [1, 2, 3]
    fpn:
      name: GhostPAN
      in_channels: [128, 256, 384]
      out_channels: 128
      kernel_size: 5
      num_extra_level: 1
      use_depthwise: True
      activation: SiLU
    head:
      name: NanoDetPlusHead
      num_classes: 80
      input_channel: 128
      feat_channels: 128
      stacked_convs: 2
      kernel_size: 5
      strides: [ 8, 16, 32, 64 ]
      activation: SiLU
      reg_max: 7
      norm_cfg:
        type: BN
      loss:
        loss_qfl:
          name: QualityFocalLoss
          use_sigmoid: True
          beta: 2.0
          loss_weight: 1.0
        loss_dfl:
          name: DistributionFocalLoss
          loss_weight: 0.25
        loss_bbox:
          name: GIoULoss
          loss_weight: 2.0
    aux_head:
      name: SimpleConvHead
      num_classes: 80
      input_channel: 256
      feat_channels: 256
      stacked_convs: 4
      strides: [ 8, 16, 32, 64 ]
      activation: SiLU
      reg_max: 7