Closed Aangss closed 7 months ago
Hi @Aangss Thank you for your interest in DEVIANT again.
I tried to test the model with the KITTI pre-training file you provided. But the result is not very satisfactory on Rope3D dataset.
This is a well-known problem in Mono3D. Changing the camera height messes up the Mono3D model. See BEVHeight, CVPR 2023.
So the geometric projection a priori for traditional 3D object detection does not apply to Rope3D's similar roadside dataset does it
It applies. Traditional detectors' depth goes haywire in inference since the detector only relies on learned parameters in inference. DEVIANT proposes network design to ensure good and consistent depth (after depth translations) even during inference. We argue it from the ego camera movement (along depth) in the paper, which is slightly more intuitive to understand. However, we could also say the ego camera remains fixed, while the object translates along the depth.
I don't know how to combine the rotation and translation matrices of the external reference
The P2
is the camera matrix and includes intrinsics (K) as well as rotations (R) and translations (t).
P2
= K [ R | t ]
You look into the following references for more details:
I would like to use Rope3D for training.
That is awesome. We welcome contributions to DEVIANT repo. Please feel free to open a PR and add the corresponding Rope3D config file to this repo.
Thanks for the information. @abhi1kumar
Closing due to inactivity.
I used some of the rope3d data for training and the results I get will be biased in pitch angle, which doesn't seem to be ideal!What could be the problem causing this.
Hi @Aangss I agree, this is not looking great. Here are the couple of checks which I would do:
plot/plot_qualitative_output.py
with show_gt_in_image
option.python plot/plot_qualitative_output.py --folder YOUR_FOLDER --show_gt_in_image
Paste the config file here.
Paste the training log here.
Paste your dataloader code.
I converted the Rope3D dataset to kitti format, so I didn't change the dataloader code part, only the config.And I only used part of the Rope3d dataset for training, about 3000 images.
2024-03-26 19:44:40,862 INFO conf: {
random_seed: 444
dataset: {'type': 'kitti', 'root_dir': 'data/', 'train_split_name': 'train', 'val_split_name': 'val', 'resolution': [960, 512], 'eval_dataset': 'kitti', 'batch_size': 8, 'class_merging': False, 'use_dontcare': False, 'use_3d_center': True, 'writelist': ['Car', 'Pedestrian', 'Cyclist'], 'random_flip': 0.5, 'random_crop': 0.5, 'scale': 0.4, 'shift': 0.1}
model: {'type': 'gupnet', 'backbone': 'dla34', 'neck': 'DLAUp', 'use_conv': 'sesn', 'replace_style': 'max_scale_after_dla34_layer', 'sesn_norm_per_scale': False, 'sesn_rescale_basis': False, 'sesn_scales': [0.83, 0.9, 1.0], 'scale_index_for_init': 0}
optimizer: {'type': 'adam', 'lr': 0.00125, 'weight_decay': 1e-05}
lr_scheduler: {'warmup': True, 'decay_rate': 0.1, 'decay_list': [90, 120]}
trainer: {'max_epoch': 140, 'eval_frequency': 20, 'save_frequency': 20, 'disp_frequency': 20, 'log_dir': 'output/run331'}
tester: {'threshold': 0.2}
2024-03-27 01:30:45,042 INFO ------ TRAIN EPOCH 140 ------
2024-03-27 01:30:45,042 INFO Learning Rate: 0.000013
2024-03-27 01:30:45,420 INFO Weights: depth:1.0000, heading:1.0000, offset2d:1.0000, offset3d:1.0000, seg:1.0000, size2d:1.0000, size3d:1.0000,
2024-03-27 01:30:58,833 INFO BATCH[0020/0254] depth_loss:1.1811, heading_loss:0.2900, offset2d_loss:0.2085, offset3d_loss:0.2216, seg_loss:0.3941, size2d_loss:0.5219, size3d_loss:-0.2126,
2024-03-27 01:31:10,127 INFO BATCH[0040/0254] depth_loss:1.2149, heading_loss:0.3047, offset2d_loss:0.2013, offset3d_loss:0.2267, seg_loss:0.4024, size2d_loss:0.4831, size3d_loss:-0.2149,
2024-03-27 01:31:21,314 INFO BATCH[0060/0254] depth_loss:1.2102, heading_loss:0.2911, offset2d_loss:0.2050, offset3d_loss:0.2250, seg_loss:0.4028, size2d_loss:0.4889, size3d_loss:-0.2270,
2024-03-27 01:31:32,509 INFO BATCH[0080/0254] depth_loss:1.2211, heading_loss:0.3034, offset2d_loss:0.2043, offset3d_loss:0.2216, seg_loss:0.3741, size2d_loss:0.5052, size3d_loss:-0.2368,
2024-03-27 01:31:43,674 INFO BATCH[0100/0254] depth_loss:1.2019, heading_loss:0.2767, offset2d_loss:0.1972, offset3d_loss:0.2253, seg_loss:0.3887, size2d_loss:0.4724, size3d_loss:-0.2361,
2024-03-27 01:31:54,819 INFO BATCH[0120/0254] depth_loss:1.2152, heading_loss:0.3006, offset2d_loss:0.1987, offset3d_loss:0.2249, seg_loss:0.3867, size2d_loss:0.4837, size3d_loss:-0.2514,
2024-03-27 01:32:06,065 INFO BATCH[0140/0254] depth_loss:1.2160, heading_loss:0.2976, offset2d_loss:0.1994, offset3d_loss:0.2227, seg_loss:0.3920, size2d_loss:0.5068, size3d_loss:-0.2001,
2024-03-27 01:32:17,229 INFO BATCH[0160/0254] depth_loss:1.2406, heading_loss:0.3051, offset2d_loss:0.2082, offset3d_loss:0.2257, seg_loss:0.3972, size2d_loss:0.5122, size3d_loss:-0.2242,
2024-03-27 01:32:28,375 INFO BATCH[0180/0254] depth_loss:1.2895, heading_loss:0.3546, offset2d_loss:0.2281, offset3d_loss:0.2282, seg_loss:0.4598, size2d_loss:0.5326, size3d_loss:-0.1834,
2024-03-27 01:32:39,558 INFO BATCH[0200/0254] depth_loss:1.2462, heading_loss:0.2810, offset2d_loss:0.2107, offset3d_loss:0.2278, seg_loss:0.4170, size2d_loss:0.4878, size3d_loss:-0.1844,
2024-03-27 01:32:50,735 INFO BATCH[0220/0254] depth_loss:1.2312, heading_loss:0.2889, offset2d_loss:0.2154, offset3d_loss:0.2266, seg_loss:0.3959, size2d_loss:0.5213, size3d_loss:-0.2292,
2024-03-27 01:33:01,952 INFO BATCH[0240/0254] depth_loss:1.1880, heading_loss:0.2983, offset2d_loss:0.1893, offset3d_loss:0.2245, seg_loss:0.3714, size2d_loss:0.4717, size3d_loss:-0.2392,
2024-03-27 01:33:09,422 INFO BATCH[0254/0254] depth_loss:0.8752, heading_loss:0.2215, offset2d_loss:0.1520, offset3d_loss:0.1572, seg_loss:0.2689, size2d_loss:0.3420, size3d_loss:-0.1718,
2024-03-27 01:33:09,882 INFO ==> Saving to checkpoint 'output/run_331/checkpoints/checkpoint_epoch_140'
the results I get will be biased in pitch angle, which doesn't seem to be ideal!
Can you confirm that the image in the comment corresponds to the GT projected 3D boxes?
If yes, why do I see the difference green and pink boxes in the BEV. Ideally, these two color boxes should be the same in the image.
I converted the Rope3D dataset to KITTI format. I tried to test the model with the KITTI pre-training file you provided. But the result is not very satisfactory. For the description of the data after I converted Rope3D:
So I have the following thoughts