SwinTransformer / Swin-Transformer-Object-Detection

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.
https://arxiv.org/abs/2103.14030
Apache License 2.0
1.79k stars 379 forks source link

Using multiscale for training and inference #76

Open lingcong-k opened 3 years ago

lingcong-k commented 3 years ago

HI, am training on my personal dataset. But am not using multiscale training and inference.

configs am using from here: '../base/models/cascade_mask_rcnn_swin_fpn.py', but data loader and augmentation part am not using the: coco_instance.py

But i use the pretrained weights cascademask-rcnn without any issue..
so my questions are 1) those pretrained weights are not trained with MULTISCALE? 2) how can I use multiscale training? coz I dont see theres any part conbining prediction logits from multiscales (I suppose the multiscale means from this paper? Hierarchical Multi-Scale Attention for Semantic Segmentation)

Thanks in advance :)

impiga commented 3 years ago

Hi, thanks for your interest.

  1. All released pretrained weights are trained with multi-scale inputs.
  2. To use multiscale training, you may refer to this config file: https://github.com/SwinTransformer/Swin-Transformer-Object-Detection/blob/6a979e2164e3fb0de0ca2546545013a4d71b2f7d/configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py#L88-L116
  3. In addition, MULTISCALE in our experiments means inputs of different iterations or different GPUs may have different scales. These scales are sampled according to the rules shown in 2. In other words, in each iteration, on each GPU, there is only one scale of input.