ZrrSkywalker / MonoDETR

[ICCV 2023] The first DETR model for monocular 3D object detection with depth-guided transformer
327 stars 31 forks source link

What is the meaning of refpoint_embed? #39

Open Yzichen opened 10 months ago

Yzichen commented 10 months ago

self.refpoint_embed = nn.Embedding(num_queries * group_num, 6)

May I ask what these 6 dimensions mean?

ZrrSkywalker commented 10 months ago

Thanks for your interest. 6 dimensions denote (x, y, l, t, t, b), representing the xy center and the distances of four box boundaries lrtb to the center.

Yzichen commented 10 months ago

Thanks for your reply, is this design from DAB-DETR?

ZrrSkywalker commented 10 months ago

The (l, r, t, b) is especially for monocular 3D object detection adopted by MonoFlex, since the projected 3D center may not locate at the center of the 2D box. For 2D DETR, just (w, h) is enough.

ysyf293 commented 10 months ago

May I ask what these 'group_num' mean?

ZrrSkywalker commented 10 months ago

@ysyf293 We refer to the tricks in GroupDETR to utilize multiple groups for queries, 11 by default, for more stable performance.

ysyf293 commented 10 months ago

@ysyf293 We refer to the tricks in GroupDETR to utilize multiple groups for queries, 11 by default, for more stable performance.

May I ask the principle of this design? And why is it not needed during the validation phase ?Also, I used your published code training test to submit results on kitii only 12.34, do you know what the problem is? I use a single card A6000 training.

ysyf293 commented 10 months ago

I would like to ask which parameters in the setting should be selected Ture to obtain the best effect? For example, two_stage: False; use_dab: Ture; use_dn: Ture; two_stage_dino: False; TWO_STAGE_DINO: false; init_box: Ture?

ZrrSkywalker commented 10 months ago

The following configs can achieve the best effect. We are still investigating how the two-stage/dab/dn/dino tricks can be used to improve MonoDETR. two_stage: False use_dab: False use_dn: False two_stage_dino: False init_box: False

ysyf293 commented 10 months ago

Thank you very much for your reply. But why am I using the full "False" parameter Settings you mentioned, training on all trainval datasets, and submitting on kitti's website with only 12.34 accuracy?

ZrrSkywalker commented 10 months ago

The current code with configurations is only for KITTI val set. We will release the test code soon. Thanks for your waiting.

yjy4231 commented 8 months ago

The current code with configurations is only for KITTI val set. We will release the test code soon. Thanks for your waiting.

image I use the new stable code to train on all trainval set, and ubmitting on kitti's website with only 15.24 accuracy? It's much low than the paper 16.47 accuracy. Are there any other details to pay attention to when training on the trainval dataset?