ZhaoJingjing713 / HPR

[CVPR 2024] Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective
MIT License
13 stars 2 forks source link

huge training cost! #1

Open fushh opened 1 month ago

fushh commented 1 month ago

HPR adds huge training cost to baselines:

  1. all three branches need bipartite matching to calculate losses
  2. Data Re-Augmentation doubles the training images (possibly equals to more training schedule)
ZhaoJingjing713 commented 1 month ago

Thank you for your comment and for pointing out the computational demands of our method. I agree that introducing auxiliary branches does result in increased computation. However, this strategy aligns with successful approaches in the field, such as in H-DETR and Co-DETR, which also leverage auxiliary losses to achieve significant performance improvements. Our experiments, which evolved from Faster R-CNN to advanced DETR series, highlight two key performance factors: (1). rich semantic fused encoder features and (2). efficient region feature refinement. Our novel HPR approach significantly enhances performance, thus demonstrating the value of these innovations despite the increased computational cost and confirming point (2) mentioned earlier.

Regarding your second point on Data Re-Augmentation, it's crucial to differentiate between mere increases in training epochs—which can lead to model overfitting—and strategic data augmentation. Our approach, similar to that detailed in paper [1] helps in mitigating overfitting by diversifying training data. This supports continuous performance improvement even with extended training schedules, where simply lengthening training epochs may result in overfitting and decreased performance. For a deeper dive into this, I'd recommend reviewing the experiments detailed in the cited paper.

I appreciate your insights and am open to discussing more efficient implementations or any further suggestions you might have!

[1] Augment your batch: better training with larger batches

JohnMBrandt commented 1 month ago

This work is so impressive! I was able to replicate your results on a ResNet and have applied HPR to an Align DETR with a Resnet backbone and seen improvements on my custom dataset. I am trying to apply it to a ViT-L backbone and running into GPU memory issues. Do you have any suggestions of what modules to turn off to save GPU RAM? E.g. based on your supplemental information, it appears that removing the Regional CA auxiliary branch would give a better training cost / accuracy trade off?

ZhaoJingjing713 commented 1 month ago

@JohnMBrandt Thanks for your interest. Yes, there are some methods that can reduce GPU memory cost:

  1. Set the num_cp. The gradient checkpoint is supported in our codes. You can set the num_cp in the config for both the encoder and decoder. Note that the num_cp should be less than the number of layers of the encoder/decoder. Additionally, we have added the ckpt_backbone and ckpt_neck in the config, which you can enable to further reduce the memory cost. However, this technique may lead to a longer training time.

  2. Enable AMP. You can enable AMP training by setting the type='AmpOptimWrapper' under optim_wrapper in the config. Please refer to mmdetection for details. Note that you may need to transfer the tensor to FP32 for loss computation in Hungarian matching.

  3. Set grad accumulation. Enable this by setting the accumulative_counts under the optim_wrapper:

    optim_wrapper_cfg = dict(
      type='AmpOptimWrapper',
      optimizer=dict(type='AdamW'),
      accumulative_counts=4) # Update the gradient every 4 times forward.

    By applying the optim_wrapper, you can use a smaller batch_size, such as 8 or 4, for training. Please note that you may need to resample the input tensors on this setting, for example, there will be totally 8 image tensors when set the batch_size=4 and enable data re-aug, additional codes are needed to resample these 8 tensors on 8 gpus.

  4. As you mentioned, you can flexibly remove the auxiliary branch, for example, the CA branch, to balance the cost and accuracy.