Open ggalan87 opened 3 years ago
Thanks for flagging this issue. If you are planning to add them, pushing a pull request would be very welcome. Otherwise, I can add them myself.
@ggalan87 If you're not using distributed training, just replace them with OptimizerHook
and EpochBasedRunner
for now.
So I did a quite thorough inspection in the current code bases of mmcv, mmdetection, apex and found the following...
Few days ago the following pull request https://github.com/open-mmlab/mmcv/pull/1013 was merged, which basically adds the functionality that was implemented in https://github.com/SwinTransformer/Swin-Transformer-Object-Detection/blob/6a979e2164e3fb0de0ca2546545013a4d71b2f7d/mmcv_custom/runner/epoch_based_runner.py#L20 for saving and resuming while using fp16. Therefore such functionality is already included within the latest mmcv versions >= 1.3.6 I checked that the same things are stored in the state_dict as in the original nvidia implementation https://github.com/NVIDIA/apex/blob/082f999a6e18a3d02306e27482cc7486dab71a50/apex/contrib/optimizers/fp16_optimizer.py with some other naming conventions.
As for the DistOptimizerHook (e.g. from configs/xcit/mask_rcnn_xcit_small_12_p16_3x_coco.py)
Original config is as follows:
# do not use mmdet version fp16
fp16 = None
optimizer_config = dict(
type="DistOptimizerHook",
update_interval=1,
grad_clip=None,
coalesce=True,
bucket_size_mb=-1,
use_fp16=True,
)
and has to be modified either as:
fp16 = None
optimizer_config = dict(
type="Fp16OptimizerHook",
grad_clip=None,
coalesce=True,
bucket_size_mb=-1,
)
or as:
fp16 = dict(loss_scale=512.)
optimizer_config = dict(
grad_clip=None,
coalesce=True,
bucket_size_mb=-1,
)
In both cases a Fp16OptimizerHook will be instantiated with distributed=True (default option) https://github.com/open-mmlab/mmcv/blob/b035fe9171ce1913beb911870099fc4dff6689c9/mmcv/runner/hooks/optimizer.py#L83
Hello, as you mention you borrowed some parts from the Swin-Transformer-Object-Detection repository for the detector part. It seems that some of them are mandatory, however not included in this repo. I am referring to parts relevant to training, for example EpochBasedRunnerAmp and the DistOptimizerHook. I already had previous experience with mmcv/mmdetection so I know how to include them, but are you going to officially include them in the repo?