ShoufaChen / DiffusionDet

[ICCV2023 Best Paper Finalist] PyTorch implementation of DiffusionDet (https://arxiv.org/abs/2211.09788)
Other
2.07k stars 161 forks source link

Questions about DDIM's performance #16

Open LOLTATQAQ opened 1 year ago

LOLTATQAQ commented 1 year ago

Hi,

Thanks for sharing your wonderful work. I have trouble figuring out the effectiveness of the DDIM process discussed in the paper. Since there is no related ablation study in the paper, I have conducted the experiments according to the instructions and used the provided checkpoints. For example, I choose the diffdet.coco.res50.yaml config and the COCO Res50 checkpoint.

  1. The model is evaluated with four iterations, and the results are 46.34 mAP.
    diffdet_step4.log
  2. The model is evaluated with four iterations, with the time step fixed to initial values (999, for example). This setting gives 46.32 mAP. diffdet_step4_fix999_749.log
  3. The model is evaluated with four iterations, with totally new random boxes in each iteration. This setting gives 46.29 mAP. diffdet_step4_fix999_749_random.log

The modified detector.py is available in detector_FixandRand.zip

It seems that the performance gain introduced by the DDIM process is less than 0.05. It seems not significant in object detection.

I further use six iterations with the initial values fixed as in the four iterations (time=999, time_next=749). The results are 46.44 mAP. However, using the DDIM process with dynamic time steps, the results are worse than using fixed time steps and just 46.35 mAP. diffdet_step6.log diffdet_step6_fix999_2749.log

Please correct me if there is something wrong with these experiments. It really confuses me a lot. Many thanks!

ShoufaChen commented 1 year ago

Hi,

Thanks for your interest in our work.

We've checked it and found similar results. The benefit of the diffusion model for object detection comes from two aspects: (1) random boxes; and (2) iterative sampling, for example, DDIM. These interesting findings potentially demonstrate that our current method mainly benefits from random boxes, whereas how to make better use of time embedding and DDIM is not fully explored.

Our current method is a preliminary attempt in this direction, and a lot of improvement space remains, eg., a more appropriate diffusion sampling method for perception tasks. We will research more in the future.