Open LOLTATQAQ opened 1 year ago
Hi,
Thanks for your interest in our work.
We've checked it and found similar results. The benefit of the diffusion model for object detection comes from two aspects: (1) random boxes; and (2) iterative sampling, for example, DDIM. These interesting findings potentially demonstrate that our current method mainly benefits from random boxes, whereas how to make better use of time embedding and DDIM is not fully explored.
Our current method is a preliminary attempt in this direction, and a lot of improvement space remains, eg., a more appropriate diffusion sampling method for perception tasks. We will research more in the future.
Hi,
Thanks for sharing your wonderful work. I have trouble figuring out the effectiveness of the DDIM process discussed in the paper. Since there is no related ablation study in the paper, I have conducted the experiments according to the instructions and used the provided checkpoints. For example, I choose the diffdet.coco.res50.yaml config and the COCO Res50 checkpoint.
diffdet_step4.log
The modified detector.py is available in detector_FixandRand.zip
It seems that the performance gain introduced by the DDIM process is less than 0.05. It seems not significant in object detection.
I further use six iterations with the initial values fixed as in the four iterations (time=999, time_next=749). The results are 46.44 mAP. However, using the DDIM process with dynamic time steps, the results are worse than using fixed time steps and just 46.35 mAP. diffdet_step6.log diffdet_step6_fix999_2749.log
Please correct me if there is something wrong with these experiments. It really confuses me a lot. Many thanks!