dingmyu / D4LCN

A pytorch implementation of "D4LCN: Learning Depth-Guided Convolutions for Monocular 3D Object Detection" CVPR 2020
MIT License
313 stars 57 forks source link

low performance #4

Closed gongshichina closed 4 years ago

gongshichina commented 4 years ago

When I use your simplified version to train, it produced a bad performance 1108200878

dingmyu commented 4 years ago

Hi, how many GPU cards did you use to train? And which depth maps did you use?

By default, we use four GPUs, batchsize=8 and iter=40000 for training. If you use smaller GPUs/batch size training, you can consider reducing the learning rate (e.g. 0.005) and increasing the number of iterations (e.g. 100000 for single card) in training.

Thanks.

dingmyu commented 4 years ago

Your result is so low that it's strange. Can you provide more details such as the config file?

gongshichina commented 4 years ago

Hi, I used 2 GPUs, and the simplified version model(one dilated depth map after 2nd block, and depth maps after 3rd, 4th block, nf=2). I modified the batch size to 2*2, and any other are kept the same as your code. I will take a try for your advice.

Thanks for your kindly reply!

DiegoJohnson commented 4 years ago

Hi, dingmyu: Sorry for my naive question. the depth map's value is actual depth or 1/d ? Any preprocess for depth map? @dingmyu

dingmyu commented 4 years ago

@gongshichina Hi, I just tested this code and it generated good performance. If your performance is not good enough, you may try:

  1. Download my trained model and test, weights, model and config file, replace resnet_dilated.py, place weight.pkl and config.pkl in the pretrain folder, and run test.sh. It should get similar results as follows:

    OLD_test_iter pretrain 2d car --> easy: 0.9298, mod: 0.8495, hard: 0.6832
    NEW_test_iter pretrain 2d car --> easy: 0.9372, mod: 0.8633, hard: 0.6983
    OLD_test_iter pretrain gr car --> easy: 0.3358, mod: 0.2543, hard: 0.2042
    NEW_test_iter pretrain gr car --> easy: 0.2923, mod: 0.2117, hard: 0.1653
    OLD_test_iter pretrain 3d car --> easy: 0.2641, mod: 0.2170, hard: 0.1780
    NEW_test_iter pretrain 3d car --> easy: 0.2135, mod: 0.1583, hard: 0.1209
    OLD_test_iter pretrain 2d pedestrian --> easy: 0.6818, mod: 0.5992, hard: 0.5141
    NEW_test_iter pretrain 2d pedestrian --> easy: 0.7161, mod: 0.6060, hard: 0.5125
    OLD_test_iter pretrain gr pedestrian --> easy: 0.0591, mod: 0.0555, hard: 0.0527
    NEW_test_iter pretrain gr pedestrian --> easy: 0.0447, mod: 0.0383, hard: 0.0311
    OLD_test_iter pretrain 3d pedestrian --> easy: 0.0412, mod: 0.0507, hard: 0.0467
    NEW_test_iter pretrain 3d pedestrian --> easy: 0.0358, mod: 0.0331, hard: 0.0276
    OLD_test_iter pretrain 2d cyclist --> easy: 0.5857, mod: 0.4178, hard: 0.4164
    NEW_test_iter pretrain 2d cyclist --> easy: 0.5914, mod: 0.4069, hard: 0.3869
    OLD_test_iter pretrain gr cyclist --> easy: 0.1291, mod: 0.1099, hard: 0.1091
    NEW_test_iter pretrain gr cyclist --> easy: 0.0495, mod: 0.0293, hard: 0.0281
    OLD_test_iter pretrain 3d cyclist --> easy: 0.1263, mod: 0.1077, hard: 0.1074
    NEW_test_iter pretrain 3d cyclist --> easy: 0.0417, mod: 0.0274, hard: 0.0263
  2. Run train.sh directly for training (iterations: 40000-100000, according to your batch-size). It should get similar results as follows:

    OLD_test_iter 40000 2d car --> easy: 0.9364, mod: 0.8554, hard: 0.6883
    NEW_test_iter 40000 2d car --> easy: 0.9422, mod: 0.8696, hard: 0.7036
    OLD_test_iter 40000 gr car --> easy: 0.3496, mod: 0.2590, hard: 0.2350
    NEW_test_iter 40000 gr car --> easy: 0.3166, mod: 0.2262, hard: 0.1782
    OLD_test_iter 40000 3d car --> easy: 0.2697, mod: 0.2165, hard: 0.1824
    NEW_test_iter 40000 3d car --> easy: 0.2222, mod: 0.1619, hard: 0.1229
    OLD_test_iter 40000 2d pedestrian --> easy: 0.7507, mod: 0.5990, hard: 0.5146
    NEW_test_iter 40000 2d pedestrian --> easy: 0.7327, mod: 0.6038, hard: 0.5106
    OLD_test_iter 40000 gr pedestrian --> easy: 0.1313, mod: 0.1146, hard: 0.1131
    NEW_test_iter 40000 gr pedestrian --> easy: 0.0493, mod: 0.0450, hard: 0.0330
    OLD_test_iter 40000 3d pedestrian --> easy: 0.1282, mod: 0.1111, hard: 0.1102
    NEW_test_iter 40000 3d pedestrian --> easy: 0.0444, mod: 0.0354, hard: 0.0299
    OLD_test_iter 40000 2d cyclist --> easy: 0.6860, mod: 0.5014, hard: 0.5014
    NEW_test_iter 40000 2d cyclist --> easy: 0.7172, mod: 0.4810, hard: 0.4601
    OLD_test_iter 40000 gr cyclist --> easy: 0.0564, mod: 0.0426, hard: 0.0413
    NEW_test_iter 40000 gr cyclist --> easy: 0.0309, mod: 0.0189, hard: 0.0152
    OLD_test_iter 40000 3d cyclist --> easy: 0.0558, mod: 0.0407, hard: 0.0407
    NEW_test_iter 40000 3d cyclist --> easy: 0.0305, mod: 0.0154, hard: 0.0151

     To get more stable results, it is recommended to download the ResNet pre-trained model provided by Ruotian Luo in Google Drive and set conf.use_rcnn_pretrain = True. And to use the simplified version of our model, you can download model and replace it at models/resnet_dilate.py.

  3. If you want to further train based on my trained model (Using DORN as depth extractor), you need to reduce the learning rate and iterations, and modify scripts/config/depth_guided_config.py as follows:

    conf.image_means = [102.9801, 115.9465, 122.7717]
    conf.image_stds = [1, 1, 1]
    conf.depth_mean = [4413.160626995486, 4413.160626995486, 4413.160626995486]
    conf.depth_std = [3270.0158918863494, 3270.0158918863494, 3270.0158918863494]
    conf.pretrained = 'pretrain/model_40000_pkl'

     The training log should be displayed as:

    iter: 50, acc (bg: 1.00, fg: 0.95, iou: 0.93), loss (bbox_2d: 0.0519, bbox_3d: 0.0818, cls: 0.0431), misc (ry: 0.17, z: 0.27), dt: 2.91, eta: 32.3h
    iter: 100, acc (bg: 1.00, fg: 0.96, iou: 0.93), loss (bbox_2d: 0.0446, bbox_3d: 0.0701, cls: 0.0258), misc (ry: 0.18, z: 0.24), dt: 2.41, eta: 26.7h
    iter: 150, acc (bg: 1.00, fg: 0.96, iou: 0.94), loss (bbox_2d: 0.0439, bbox_3d: 0.0666, cls: 0.0310), misc (ry: 0.16, z: 0.25), dt: 2.26, eta: 25.0h
    iter: 200, acc (bg: 1.00, fg: 0.97, iou: 0.94), loss (bbox_2d: 0.0455, bbox_3d: 0.0671, cls: 0.0283), misc (ry: 0.17, z: 0.24), dt: 2.19, eta: 24.2h
    iter: 250, acc (bg: 1.00, fg: 0.97, iou: 0.94), loss (bbox_2d: 0.0423, bbox_3d: 0.0637, cls: 0.0195), misc (ry: 0.16, z: 0.24), dt: 2.13, eta: 23.5h
    iter: 300, acc (bg: 1.00, fg: 0.97, iou: 0.94), loss (bbox_2d: 0.0406, bbox_3d: 0.0702, cls: 0.0237), misc (ry: 0.17, z: 0.25), dt: 2.09, eta: 23.1h
    iter: 350, acc (bg: 1.00, fg: 0.98, iou: 0.94), loss (bbox_2d: 0.0362, bbox_3d: 0.0587, cls: 0.0183), misc (ry: 0.15, z: 0.24), dt: 2.07, eta: 22.8h
    iter: 400, acc (bg: 1.00, fg: 0.98, iou: 0.94), loss (bbox_2d: 0.0358, bbox_3d: 0.0557, cls: 0.0198), misc (ry: 0.15, z: 0.24), dt: 2.06, eta: 22.7h
    iter: 450, acc (bg: 1.00, fg: 0.97, iou: 0.94), loss (bbox_2d: 0.0408, bbox_3d: 0.0576, cls: 0.0226), misc (ry: 0.15, z: 0.23), dt: 2.05, eta: 22.5h
    iter: 500, acc (bg: 1.00, fg: 0.97, iou: 0.94), loss (bbox_2d: 0.0418, bbox_3d: 0.0661, cls: 0.0235), misc (ry: 0.16, z: 0.24), dt: 2.05, eta: 22.5h
    testing 100/3769, dt: 0.514, eta: 31.5m
    testing 200/3769, dt: 0.530, eta: 31.5m
    ...
    testing 3700/3769, dt: 0.707, eta: 48.8s
    OLD_test_iter 500 2d car --> easy: 0.9248, mod: 0.8515, hard: 0.6861
    NEW_test_iter 500 2d car --> easy: 0.9356, mod: 0.8634, hard: 0.6996
    OLD_test_iter 500 gr car --> easy: 0.3471, mod: 0.2545, hard: 0.2298
    NEW_test_iter 500 gr car --> easy: 0.3125, mod: 0.2206, hard: 0.1743
    OLD_test_iter 500 3d car --> easy: 0.2652, mod: 0.2117, hard: 0.1795
    NEW_test_iter 500 3d car --> easy: 0.2272, mod: 0.1565, hard: 0.1194
    OLD_test_iter 500 2d pedestrian --> easy: 0.7468, mod: 0.5981, hard: 0.5130
    NEW_test_iter 500 2d pedestrian --> easy: 0.7317, mod: 0.6216, hard: 0.5286
    OLD_test_iter 500 gr pedestrian --> easy: 0.1369, mod: 0.1162, hard: 0.1156
    NEW_test_iter 500 gr pedestrian --> easy: 0.0564, mod: 0.0474, hard: 0.0400
    OLD_test_iter 500 3d pedestrian --> easy: 0.1271, mod: 0.1123, hard: 0.1121
    NEW_test_iter 500 3d pedestrian --> easy: 0.0459, mod: 0.0365, hard: 0.0302
    OLD_test_iter 500 2d cyclist --> easy: 0.6799, mod: 0.5005, hard: 0.4962
    NEW_test_iter 500 2d cyclist --> easy: 0.7095, mod: 0.4782, hard: 0.4563
    OLD_test_iter 500 gr cyclist --> easy: 0.0486, mod: 0.0317, hard: 0.0330
    NEW_test_iter 500 gr cyclist --> easy: 0.0394, mod: 0.0213, hard: 0.0221
    OLD_test_iter 500 3d cyclist --> easy: 0.0456, mod: 0.0302, hard: 0.0295
    NEW_test_iter 500 3d cyclist --> easy: 0.0333, mod: 0.0202, hard: 0.0172
dingmyu commented 4 years ago

@DiegoJohnson Both real depth map (d) and disparity map (1/d) can be used, no pre-processing. Actually, the absolute depth value is not needed and we just use the relative depth (d or 1/d) as guidance. For different depth maps, you need to calculate their mean and std, for example:

        conf.depth_mean = [4413.160626995486, 4413.160626995486, 4413.160626995486]  # for DORN
        conf.depth_std = [3270.0158918863494, 3270.0158918863494, 3270.0158918863494]

        conf.depth_mean = [8295.013626842678, 8295.013626842678, 8295.013626842678]  # for PSMNet
        conf.depth_std = [5134.9781439128665, 5134.9781439128665, 5134.9781439128665]

        conf.depth_mean = [30.83664619525601, 30.83664619525601, 30.83664619525601]  # for DISPNet
        conf.depth_std = [19.992999492848206, 19.992999492848206, 19.992999492848206]

        conf.depth_mean = [137.39162828, 40.58310471, 140.70854621]  # for MonoDepth
        conf.depth_std = [33.75859339, 51.479677, 65.254889]
Hesene commented 4 years ago

I use 2 GPU with 40000 iterations and bacth size is 2*2, I get result: It looks different from your results

OLD_test_iter 40000 2d car --> easy: 0.9175, mod: 0.7659, hard: 0.6723 NEW_test_iter 40000 2d car --> easy: 0.9256, mod: 0.8080, hard: 0.6677 OLD_test_iter 40000 gr car --> easy: 0.3183, mod: 0.2339, hard: 0.1928 NEW_test_iter 40000 gr car --> easy: 0.2703, mod: 0.1880, hard: 0.1478 OLD_test_iter 40000 3d car --> easy: 0.2382, mod: 0.1771, hard: 0.1565 NEW_test_iter 40000 3d car --> easy: 0.1756, mod: 0.1241, hard: 0.0980 OLD_test_iter 40000 2d pedestrian --> easy: 0.6270, mod: 0.4909, hard: 0.4104 NEW_test_iter 40000 2d pedestrian --> easy: 0.6197, mod: 0.5032, hard: 0.4162 OLD_test_iter 40000 gr pedestrian --> easy: 0.0327, mod: 0.0352, hard: 0.0318 NEW_test_iter 40000 gr pedestrian --> easy: 0.0250, mod: 0.0239, hard: 0.0184 OLD_test_iter 40000 3d pedestrian --> easy: 0.0266, mod: 0.0273, hard: 0.0277 NEW_test_iter 40000 3d pedestrian --> easy: 0.0179, mod: 0.0158, hard: 0.0153 OLD_test_iter 40000 2d cyclist --> easy: 0.4254, mod: 0.2570, hard: 0.2572 NEW_test_iter 40000 2d cyclist --> easy: 0.4108, mod: 0.2477, hard: 0.2269 OLD_test_iter 40000 gr cyclist --> easy: 0.0355, mod: 0.0216, hard: 0.0224 NEW_test_iter 40000 gr cyclist --> easy: 0.0233, mod: 0.0145, hard: 0.0143 OLD_test_iter 40000 3d cyclist --> easy: 0.0303, mod: 0.0196, hard: 0.0192 NEW_test_iter 40000 3d cyclist --> easy: 0.0201, mod: 0.0116, hard: 0.0115

dingmyu commented 4 years ago

@Hesene

  1. As I noted above, by default we use 4 GPUs, batchsize=8 and iter=40000 for training. If you use smaller GPUs/batch size training, you can consider reducing the learning rate (e.g. 0.005) and increasing the number of iterations (e.g. 100000 for single card) in training.

  2. To get more stable results, it is recommended to download the ResNet pre-trained model provided by Ruotian Luo in Google Drive and set conf.use_rcnn_pretrain = True.

Thanks

dingmyu commented 4 years ago

Feel free to reopen it if you have any further questions.

Hesene commented 4 years ago

@Hesene

  1. As I noted above, by default we use 4 GPUs, batchsize=8 and iter=40000 for training. If you use smaller GPUs/batch size training, you can consider reducing the learning rate (e.g. 0.005) and increasing the number of iterations (e.g. 100000 for single card) in training.
  2. To get more stable results, it is recommended to download the ResNet pre-trained model provided by Ruotian Luo in Google Drive and set conf.use_rcnn_pretrain = True.

Thanks

I will try it, Thanks for your sharing

Hesene commented 4 years ago

Feel free to reopen it if you have any further questions.

@dingmyu Hi, in the link https://drive.google.com/drive/folders/0B7fNdx_jAqhtNE10TDZDbFRuU0E, it didn't have res50_faster_rcnn_iter_1190000.pth or faster_rcnn_1_10_14657.pth, and which model should we download. Thank you for your sharing

dingmyu commented 4 years ago

@Hesene In your link/res50/converted_from_tf/coco_900k_1190K.rar, unzip it and u will see res50_faster_rcnn_iter_1190000.pth

Hesene commented 4 years ago

@Hesene In your link/res50/converted_from_tf/coco_900k_1190K.rar, unzip it and u will see res50_faster_rcnn_iter_1190000.pth

@dingmyu Hi I unzio it and get coco_900k_1190K file, not a '.pth' file and it can't load.

dingmyu commented 4 years ago

@Hesene Hi, try to rename it into .zip or .tar.gz and then unzip it?

I can see the iter_119000 model in this link. 073D981D-95AA-4641-970E-2BAB912A4A6B

Hesene commented 4 years ago

@Hesene In your link/res50/converted_from_tf/coco_900k_1190K.rar, unzip it and u will see res50_faster_rcnn_iter_1190000.pth

@dingmyu Hi I unzio it and get coco_900k_1190K file, not a '.pth' file and it can't load.

@Hesene Hi, try to rename it into .zip or .tar.gz and then unzip it?

I can see the iter_119000 model in this link. 073D981D-95AA-4641-970E-2BAB912A4A6B

Thanks a lot, I get it ,Thank you for your answer again

vobecant commented 3 years ago

Dear authors,

thank you very much for your work. I would like to ask you a few questions.

First, when I evaluate your provided network, I get the following results:

OLD_test_iter pretrain 2d car --> easy: 0.9277, mod: 0.8439, hard: 0.6785
NEW_test_iter pretrain 2d car --> easy: 0.9342, mod: 0.8377, hard: 0.6742
OLD_test_iter pretrain gr car --> easy: 0.3349, mod: 0.2507, hard: 0.1983
NEW_test_iter pretrain gr car --> easy: 0.3225, mod: 0.2268, hard: 0.1722
OLD_test_iter pretrain 3d car --> easy: 0.2490, mod: 0.2077, hard: 0.1729
NEW_test_iter pretrain 3d car --> easy: 0.2317, mod: 0.1621, hard: 0.1234
OLD_test_iter pretrain 2d pedestrian --> easy: 0.6618, mod: 0.5812, hard: 0.4975
NEW_test_iter pretrain 2d pedestrian --> easy: 0.6896, mod: 0.5670, hard: 0.4756
OLD_test_iter pretrain gr pedestrian --> easy: 0.0628, mod: 0.0512, hard: 0.0483
NEW_test_iter pretrain gr pedestrian --> easy: 0.0471, mod: 0.0391, hard: 0.0321
OLD_test_iter pretrain 3d pedestrian --> easy: 0.0436, mod: 0.0445, hard: 0.0396
NEW_test_iter pretrain 3d pedestrian --> easy: 0.0371, mod: 0.0293, hard: 0.0270
OLD_test_iter pretrain 2d cyclist --> easy: 0.6234, mod: 0.4608, hard: 0.3972
NEW_test_iter pretrain 2d cyclist --> easy: 0.6301, mod: 0.4180, hard: 0.3816
OLD_test_iter pretrain gr cyclist --> easy: 0.0344, mod: 0.0296, hard: 0.0306
NEW_test_iter pretrain gr cyclist --> easy: 0.0295, mod: 0.0168, hard: 0.0168
OLD_test_iter pretrain 3d cyclist --> easy: 0.0293, mod: 0.0270, hard: 0.0262
NEW_test_iter pretrain 3d cyclist --> easy: 0.0263, mod: 0.0149, hard: 0.0148

These are OK results but are not the same that you provide in your paper. I mean these results:

Screenshot 2021-03-27 at 18 58 07

Also, when I run train.sh, I get similar results to the results that I get using the provided model, but these results are still not the same as in the paper. In fact, it is significantly better for the pedestrian class and better for the cyclist class.

OLD_test_iter 40000 2d car --> easy: 0.8290, mod: 0.7506, hard: 0.5892
NEW_test_iter 40000 2d car --> easy: 0.8759, mod: 0.7708, hard: 0.6137
OLD_test_iter 40000 gr car --> easy: 0.3448, mod: 0.2528, hard: 0.2053
NEW_test_iter 40000 gr car --> easy: 0.3066, mod: 0.2115, hard: 0.1653
OLD_test_iter 40000 3d car --> easy: 0.2671, mod: 0.1953, hard: 0.1754
NEW_test_iter 40000 3d car --> easy: 0.2230, mod: 0.1503, hard: 0.1193
OLD_test_iter 40000 2d pedestrian --> easy: 0.5670, mod: 0.4883, hard: 0.4096
NEW_test_iter 40000 2d pedestrian --> easy: 0.5822, mod: 0.4813, hard: 0.3946
OLD_test_iter 40000 gr pedestrian --> easy: 0.1323, mod: 0.1156, hard: 0.1137
NEW_test_iter 40000 gr pedestrian --> easy: 0.0528, mod: 0.0424, hard: 0.0351
OLD_test_iter 40000 3d pedestrian --> easy: 0.0473, mod: 0.0482, hard: 0.0413
NEW_test_iter 40000 3d pedestrian --> easy: 0.0405, mod: 0.0314, hard: 0.0287
OLD_test_iter 40000 2d cyclist --> easy: 0.4861, mod: 0.3255, hard: 0.3241
NEW_test_iter 40000 2d cyclist --> easy: 0.4460, mod: 0.2657, hard: 0.2633
OLD_test_iter 40000 gr cyclist --> easy: 0.1132, mod: 0.1058, hard: 0.1064
NEW_test_iter 40000 gr cyclist --> easy: 0.0375, mod: 0.0242, hard: 0.0238
OLD_test_iter 40000 3d cyclist --> easy: 0.1070, mod: 0.0909, hard: 0.0909
NEW_test_iter 40000 3d cyclist --> easy: 0.0213, mod: 0.0141, hard: 0.0144

Can you please tell me how can I obtain the same results as in the paper? Thank you!