[Features] Support FP16 training

IDEA-Research / detrex

detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.

Apache License 2.0

2.01k stars 209 forks source link

TODO

[x] support fp16 training, which will reduce 20-30% GPU memory usage.
[x] fp16 training baseline dino-r50-4scale-12ep: 49.1 AP (with amp) vs 49.2 AP (w/o amp)

Note

For MultiScaleDeformableAttention, we simply convert the input value to torch.float32 and convert the output from torch.float32 to torch.float16, which means we skip fp16 and conduct fp32 computation in MultiScaleDeformableAttention operator.

Usage

start fp16 training with train.amp.enabled:

python tools/train_net.py \
    --config-file projects/dab_detr/configs/dab_detr_r50_50ep.py \
    --num-gpus 8 \
    train.amp.enabled=True

IDEA-Research / detrex