Implement fp16 by torch.cuda.amp

When i want to use fp16 to accelerate my model training, I got

  File "train.py", line 107, in <module>
    main(opt)
  File "train.py", line 81, in main
    log_dict_train, _ = trainer.train(epoch, train_loader)
  File "/home/wx/hoi/PPDM-pt1/src/lib/trainers.py", line 143, in train
    ret, results = self.run_epoch(model_with_loss, epoch, data_loader)
  File "/home/wx/hoi/PPDM-pt1/src/lib/trainers.py", line 100, in run_epoch
    output, loss, loss_states = model_with_loss(batch)
  File "/home/wx/anaconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wx/anaconda3/envs/torch/lib/python3.7/site-packages/apex/parallel/distributed.py", line 560, in forward
    result = self.module(*inputs, **kwargs)
  File "/home/wx/anaconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wx/hoi/PPDM-pt1/src/lib/trainers.py", line 23, in forward
    outputs = self.model(batch['input'])
  File "/home/wx/anaconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wx/hoi/PPDM-pt1/src/lib/models/networks/pose_dla_dcn.py", line 376, in forward
    x = self.dla_up(x)
  File "/home/wx/anaconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wx/hoi/PPDM-pt1/src/lib/models/networks/pose_dla_dcn.py", line 305, in forward
    ida(layers, len(layers) - i - 2, len(layers))
  File "/home/wx/anaconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wx/hoi/PPDM-pt1/src/lib/models/networks/pose_dla_dcn.py", line 279, in forward
    layers[i] = upsample(project(layers[i]))
  File "/home/wx/anaconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wx/hoi/PPDM-pt1/src/lib/models/networks/pose_dla_dcn.py", line 251, in forward
    x = self.conv(x)
  File "/home/wx/anaconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wx/hoi/PPDM-pt1/src/lib/models/networks/DCNv2/dcn_v2.py", line 170, in forward
    self.deformable_groups,
  File "/home/wx/hoi/PPDM-pt1/src/lib/models/networks/DCNv2/dcn_v2.py", line 37, in forward
    ctx.deformable_groups,
RuntimeError: expected scalar type Float but found Half

So, I try to fix this bug. And according to add torch.cuda.amp decorator to _DCNv2 forward and backward function, it seems work well in my machine:

Ubuntu 18.04
RTX 2080Ti
CUDA 10.1
pytorch 17. 1

This is my test script, and i think it need more careful experiment

CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py Hoidet --exp_id fp16_test --batch_size 24  --lr 3e-4 --gpus 0,1 --num_workers 2 --val_intervals 100000 --image_dir images/train2015 --load_model ../models/ctdet_coco_dla_2x.pth --dataset hico --dist --fp16

YueLiao / PPDM

Implement fp16 by torch.cuda.amp #49