Open Yzichen opened 10 months ago
Thanks for your interest. 6 dimensions denote (x, y, l, t, t, b), representing the xy center and the distances of four box boundaries lrtb to the center.
Thanks for your reply, is this design from DAB-DETR?
The (l, r, t, b) is especially for monocular 3D object detection adopted by MonoFlex, since the projected 3D center may not locate at the center of the 2D box. For 2D DETR, just (w, h) is enough.
May I ask what these 'group_num' mean?
@ysyf293 We refer to the tricks in GroupDETR to utilize multiple groups for queries, 11 by default, for more stable performance.
@ysyf293 We refer to the tricks in GroupDETR to utilize multiple groups for queries, 11 by default, for more stable performance.
May I ask the principle of this design? And why is it not needed during the validation phase ?Also, I used your published code training test to submit results on kitii only 12.34, do you know what the problem is? I use a single card A6000 training.
I would like to ask which parameters in the setting should be selected Ture to obtain the best effect? For example, two_stage: False; use_dab: Ture; use_dn: Ture; two_stage_dino: False; TWO_STAGE_DINO: false; init_box: Ture?
The following configs can achieve the best effect. We are still investigating how the two-stage/dab/dn/dino tricks can be used to improve MonoDETR. two_stage: False use_dab: False use_dn: False two_stage_dino: False init_box: False
Thank you very much for your reply. But why am I using the full "False" parameter Settings you mentioned, training on all trainval datasets, and submitting on kitti's website with only 12.34 accuracy?
The current code with configurations is only for KITTI val set. We will release the test code soon. Thanks for your waiting.
The current code with configurations is only for KITTI val set. We will release the test code soon. Thanks for your waiting.
I use the new stable code to train on all trainval set, and ubmitting on kitti's website with only 15.24 accuracy? It's much low than the paper 16.47 accuracy. Are there any other details to pay attention to when training on the trainval dataset?
self.refpoint_embed = nn.Embedding(num_queries * group_num, 6)
May I ask what these 6 dimensions mean?