Code parts from Swin-Transformer-Object-Detection

ggalan87 commented 3 years ago

Hello, as you mention you borrowed some parts from the Swin-Transformer-Object-Detection repository for the detector part. It seems that some of them are mandatory, however not included in this repo. I am referring to parts relevant to training, for example EpochBasedRunnerAmp and the DistOptimizerHook. I already had previous experience with mmcv/mmdetection so I know how to include them, but are you going to officially include them in the repo?

aelnouby commented 3 years ago

Thanks for flagging this issue. If you are planning to add them, pushing a pull request would be very welcome. Otherwise, I can add them myself.

cyrilzakka commented 3 years ago

@ggalan87 If you're not using distributed training, just replace them with OptimizerHook and EpochBasedRunner for now.

ggalan87 commented 3 years ago

So I did a quite thorough inspection in the current code bases of mmcv, mmdetection, apex and found the following...

Few days ago the following pull request https://github.com/open-mmlab/mmcv/pull/1013 was merged, which basically adds the functionality that was implemented in https://github.com/SwinTransformer/Swin-Transformer-Object-Detection/blob/6a979e2164e3fb0de0ca2546545013a4d71b2f7d/mmcv_custom/runner/epoch_based_runner.py#L20 for saving and resuming while using fp16. Therefore such functionality is already included within the latest mmcv versions >= 1.3.6 I checked that the same things are stored in the state_dict as in the original nvidia implementation https://github.com/NVIDIA/apex/blob/082f999a6e18a3d02306e27482cc7486dab71a50/apex/contrib/optimizers/fp16_optimizer.py with some other naming conventions.

As for the DistOptimizerHook (e.g. from configs/xcit/mask_rcnn_xcit_small_12_p16_3x_coco.py)

Original config is as follows:

# do not use mmdet version fp16
fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

and has to be modified either as:

fp16 = None
optimizer_config = dict(
    type="Fp16OptimizerHook",
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
)

or as:

fp16 = dict(loss_scale=512.)
optimizer_config = dict(
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
)

In both cases a Fp16OptimizerHook will be instantiated with distributed=True (default option) https://github.com/open-mmlab/mmcv/blob/b035fe9171ce1913beb911870099fc4dff6689c9/mmcv/runner/hooks/optimizer.py#L83

facebookresearch / xcit

Code parts from Swin-Transformer-Object-Detection #10