Closed gongshichina closed 4 years ago
Hi, how many GPU cards did you use to train? And which depth maps did you use?
By default, we use four GPUs, batchsize=8 and iter=40000 for training. If you use smaller GPUs/batch size training, you can consider reducing the learning rate (e.g. 0.005) and increasing the number of iterations (e.g. 100000 for single card) in training.
Thanks.
Your result is so low that it's strange. Can you provide more details such as the config file?
Hi, I used 2 GPUs, and the simplified version model(one dilated depth map after 2nd block, and depth maps after 3rd, 4th block, nf=2). I modified the batch size to 2*2, and any other are kept the same as your code. I will take a try for your advice.
Thanks for your kindly reply!
Hi, dingmyu: Sorry for my naive question. the depth map's value is actual depth or 1/d ? Any preprocess for depth map? @dingmyu
@gongshichina Hi, I just tested this code and it generated good performance. If your performance is not good enough, you may try:
Download my trained model and test, weights, model and config file, replace resnet_dilated.py
, place weight.pkl
and config.pkl
in the pretrain
folder, and run test.sh
. It should get similar results as follows:
OLD_test_iter pretrain 2d car --> easy: 0.9298, mod: 0.8495, hard: 0.6832
NEW_test_iter pretrain 2d car --> easy: 0.9372, mod: 0.8633, hard: 0.6983
OLD_test_iter pretrain gr car --> easy: 0.3358, mod: 0.2543, hard: 0.2042
NEW_test_iter pretrain gr car --> easy: 0.2923, mod: 0.2117, hard: 0.1653
OLD_test_iter pretrain 3d car --> easy: 0.2641, mod: 0.2170, hard: 0.1780
NEW_test_iter pretrain 3d car --> easy: 0.2135, mod: 0.1583, hard: 0.1209
OLD_test_iter pretrain 2d pedestrian --> easy: 0.6818, mod: 0.5992, hard: 0.5141
NEW_test_iter pretrain 2d pedestrian --> easy: 0.7161, mod: 0.6060, hard: 0.5125
OLD_test_iter pretrain gr pedestrian --> easy: 0.0591, mod: 0.0555, hard: 0.0527
NEW_test_iter pretrain gr pedestrian --> easy: 0.0447, mod: 0.0383, hard: 0.0311
OLD_test_iter pretrain 3d pedestrian --> easy: 0.0412, mod: 0.0507, hard: 0.0467
NEW_test_iter pretrain 3d pedestrian --> easy: 0.0358, mod: 0.0331, hard: 0.0276
OLD_test_iter pretrain 2d cyclist --> easy: 0.5857, mod: 0.4178, hard: 0.4164
NEW_test_iter pretrain 2d cyclist --> easy: 0.5914, mod: 0.4069, hard: 0.3869
OLD_test_iter pretrain gr cyclist --> easy: 0.1291, mod: 0.1099, hard: 0.1091
NEW_test_iter pretrain gr cyclist --> easy: 0.0495, mod: 0.0293, hard: 0.0281
OLD_test_iter pretrain 3d cyclist --> easy: 0.1263, mod: 0.1077, hard: 0.1074
NEW_test_iter pretrain 3d cyclist --> easy: 0.0417, mod: 0.0274, hard: 0.0263
Run train.sh
directly for training (iterations: 40000-100000, according to your batch-size).
It should get similar results as follows:
OLD_test_iter 40000 2d car --> easy: 0.9364, mod: 0.8554, hard: 0.6883
NEW_test_iter 40000 2d car --> easy: 0.9422, mod: 0.8696, hard: 0.7036
OLD_test_iter 40000 gr car --> easy: 0.3496, mod: 0.2590, hard: 0.2350
NEW_test_iter 40000 gr car --> easy: 0.3166, mod: 0.2262, hard: 0.1782
OLD_test_iter 40000 3d car --> easy: 0.2697, mod: 0.2165, hard: 0.1824
NEW_test_iter 40000 3d car --> easy: 0.2222, mod: 0.1619, hard: 0.1229
OLD_test_iter 40000 2d pedestrian --> easy: 0.7507, mod: 0.5990, hard: 0.5146
NEW_test_iter 40000 2d pedestrian --> easy: 0.7327, mod: 0.6038, hard: 0.5106
OLD_test_iter 40000 gr pedestrian --> easy: 0.1313, mod: 0.1146, hard: 0.1131
NEW_test_iter 40000 gr pedestrian --> easy: 0.0493, mod: 0.0450, hard: 0.0330
OLD_test_iter 40000 3d pedestrian --> easy: 0.1282, mod: 0.1111, hard: 0.1102
NEW_test_iter 40000 3d pedestrian --> easy: 0.0444, mod: 0.0354, hard: 0.0299
OLD_test_iter 40000 2d cyclist --> easy: 0.6860, mod: 0.5014, hard: 0.5014
NEW_test_iter 40000 2d cyclist --> easy: 0.7172, mod: 0.4810, hard: 0.4601
OLD_test_iter 40000 gr cyclist --> easy: 0.0564, mod: 0.0426, hard: 0.0413
NEW_test_iter 40000 gr cyclist --> easy: 0.0309, mod: 0.0189, hard: 0.0152
OLD_test_iter 40000 3d cyclist --> easy: 0.0558, mod: 0.0407, hard: 0.0407
NEW_test_iter 40000 3d cyclist --> easy: 0.0305, mod: 0.0154, hard: 0.0151
To get more stable results, it is recommended to download the ResNet pre-trained model provided by Ruotian Luo in Google Drive and set conf.use_rcnn_pretrain = True
. And to use the simplified version of our model, you can download model and replace it at models/resnet_dilate.py
.
If you want to further train based on my trained model (Using DORN as depth extractor), you need to reduce the learning rate and iterations, and modify scripts/config/depth_guided_config.py
as follows:
conf.image_means = [102.9801, 115.9465, 122.7717]
conf.image_stds = [1, 1, 1]
conf.depth_mean = [4413.160626995486, 4413.160626995486, 4413.160626995486]
conf.depth_std = [3270.0158918863494, 3270.0158918863494, 3270.0158918863494]
conf.pretrained = 'pretrain/model_40000_pkl'
The training log should be displayed as:
iter: 50, acc (bg: 1.00, fg: 0.95, iou: 0.93), loss (bbox_2d: 0.0519, bbox_3d: 0.0818, cls: 0.0431), misc (ry: 0.17, z: 0.27), dt: 2.91, eta: 32.3h
iter: 100, acc (bg: 1.00, fg: 0.96, iou: 0.93), loss (bbox_2d: 0.0446, bbox_3d: 0.0701, cls: 0.0258), misc (ry: 0.18, z: 0.24), dt: 2.41, eta: 26.7h
iter: 150, acc (bg: 1.00, fg: 0.96, iou: 0.94), loss (bbox_2d: 0.0439, bbox_3d: 0.0666, cls: 0.0310), misc (ry: 0.16, z: 0.25), dt: 2.26, eta: 25.0h
iter: 200, acc (bg: 1.00, fg: 0.97, iou: 0.94), loss (bbox_2d: 0.0455, bbox_3d: 0.0671, cls: 0.0283), misc (ry: 0.17, z: 0.24), dt: 2.19, eta: 24.2h
iter: 250, acc (bg: 1.00, fg: 0.97, iou: 0.94), loss (bbox_2d: 0.0423, bbox_3d: 0.0637, cls: 0.0195), misc (ry: 0.16, z: 0.24), dt: 2.13, eta: 23.5h
iter: 300, acc (bg: 1.00, fg: 0.97, iou: 0.94), loss (bbox_2d: 0.0406, bbox_3d: 0.0702, cls: 0.0237), misc (ry: 0.17, z: 0.25), dt: 2.09, eta: 23.1h
iter: 350, acc (bg: 1.00, fg: 0.98, iou: 0.94), loss (bbox_2d: 0.0362, bbox_3d: 0.0587, cls: 0.0183), misc (ry: 0.15, z: 0.24), dt: 2.07, eta: 22.8h
iter: 400, acc (bg: 1.00, fg: 0.98, iou: 0.94), loss (bbox_2d: 0.0358, bbox_3d: 0.0557, cls: 0.0198), misc (ry: 0.15, z: 0.24), dt: 2.06, eta: 22.7h
iter: 450, acc (bg: 1.00, fg: 0.97, iou: 0.94), loss (bbox_2d: 0.0408, bbox_3d: 0.0576, cls: 0.0226), misc (ry: 0.15, z: 0.23), dt: 2.05, eta: 22.5h
iter: 500, acc (bg: 1.00, fg: 0.97, iou: 0.94), loss (bbox_2d: 0.0418, bbox_3d: 0.0661, cls: 0.0235), misc (ry: 0.16, z: 0.24), dt: 2.05, eta: 22.5h
testing 100/3769, dt: 0.514, eta: 31.5m
testing 200/3769, dt: 0.530, eta: 31.5m
...
testing 3700/3769, dt: 0.707, eta: 48.8s
OLD_test_iter 500 2d car --> easy: 0.9248, mod: 0.8515, hard: 0.6861
NEW_test_iter 500 2d car --> easy: 0.9356, mod: 0.8634, hard: 0.6996
OLD_test_iter 500 gr car --> easy: 0.3471, mod: 0.2545, hard: 0.2298
NEW_test_iter 500 gr car --> easy: 0.3125, mod: 0.2206, hard: 0.1743
OLD_test_iter 500 3d car --> easy: 0.2652, mod: 0.2117, hard: 0.1795
NEW_test_iter 500 3d car --> easy: 0.2272, mod: 0.1565, hard: 0.1194
OLD_test_iter 500 2d pedestrian --> easy: 0.7468, mod: 0.5981, hard: 0.5130
NEW_test_iter 500 2d pedestrian --> easy: 0.7317, mod: 0.6216, hard: 0.5286
OLD_test_iter 500 gr pedestrian --> easy: 0.1369, mod: 0.1162, hard: 0.1156
NEW_test_iter 500 gr pedestrian --> easy: 0.0564, mod: 0.0474, hard: 0.0400
OLD_test_iter 500 3d pedestrian --> easy: 0.1271, mod: 0.1123, hard: 0.1121
NEW_test_iter 500 3d pedestrian --> easy: 0.0459, mod: 0.0365, hard: 0.0302
OLD_test_iter 500 2d cyclist --> easy: 0.6799, mod: 0.5005, hard: 0.4962
NEW_test_iter 500 2d cyclist --> easy: 0.7095, mod: 0.4782, hard: 0.4563
OLD_test_iter 500 gr cyclist --> easy: 0.0486, mod: 0.0317, hard: 0.0330
NEW_test_iter 500 gr cyclist --> easy: 0.0394, mod: 0.0213, hard: 0.0221
OLD_test_iter 500 3d cyclist --> easy: 0.0456, mod: 0.0302, hard: 0.0295
NEW_test_iter 500 3d cyclist --> easy: 0.0333, mod: 0.0202, hard: 0.0172
@DiegoJohnson Both real depth map (d) and disparity map (1/d) can be used, no pre-processing. Actually, the absolute depth value is not needed and we just use the relative depth (d or 1/d) as guidance. For different depth maps, you need to calculate their mean and std, for example:
conf.depth_mean = [4413.160626995486, 4413.160626995486, 4413.160626995486] # for DORN
conf.depth_std = [3270.0158918863494, 3270.0158918863494, 3270.0158918863494]
conf.depth_mean = [8295.013626842678, 8295.013626842678, 8295.013626842678] # for PSMNet
conf.depth_std = [5134.9781439128665, 5134.9781439128665, 5134.9781439128665]
conf.depth_mean = [30.83664619525601, 30.83664619525601, 30.83664619525601] # for DISPNet
conf.depth_std = [19.992999492848206, 19.992999492848206, 19.992999492848206]
conf.depth_mean = [137.39162828, 40.58310471, 140.70854621] # for MonoDepth
conf.depth_std = [33.75859339, 51.479677, 65.254889]
I use 2 GPU with 40000 iterations and bacth size is 2*2, I get result: It looks different from your results
OLD_test_iter 40000 2d car --> easy: 0.9175, mod: 0.7659, hard: 0.6723 NEW_test_iter 40000 2d car --> easy: 0.9256, mod: 0.8080, hard: 0.6677 OLD_test_iter 40000 gr car --> easy: 0.3183, mod: 0.2339, hard: 0.1928 NEW_test_iter 40000 gr car --> easy: 0.2703, mod: 0.1880, hard: 0.1478 OLD_test_iter 40000 3d car --> easy: 0.2382, mod: 0.1771, hard: 0.1565 NEW_test_iter 40000 3d car --> easy: 0.1756, mod: 0.1241, hard: 0.0980 OLD_test_iter 40000 2d pedestrian --> easy: 0.6270, mod: 0.4909, hard: 0.4104 NEW_test_iter 40000 2d pedestrian --> easy: 0.6197, mod: 0.5032, hard: 0.4162 OLD_test_iter 40000 gr pedestrian --> easy: 0.0327, mod: 0.0352, hard: 0.0318 NEW_test_iter 40000 gr pedestrian --> easy: 0.0250, mod: 0.0239, hard: 0.0184 OLD_test_iter 40000 3d pedestrian --> easy: 0.0266, mod: 0.0273, hard: 0.0277 NEW_test_iter 40000 3d pedestrian --> easy: 0.0179, mod: 0.0158, hard: 0.0153 OLD_test_iter 40000 2d cyclist --> easy: 0.4254, mod: 0.2570, hard: 0.2572 NEW_test_iter 40000 2d cyclist --> easy: 0.4108, mod: 0.2477, hard: 0.2269 OLD_test_iter 40000 gr cyclist --> easy: 0.0355, mod: 0.0216, hard: 0.0224 NEW_test_iter 40000 gr cyclist --> easy: 0.0233, mod: 0.0145, hard: 0.0143 OLD_test_iter 40000 3d cyclist --> easy: 0.0303, mod: 0.0196, hard: 0.0192 NEW_test_iter 40000 3d cyclist --> easy: 0.0201, mod: 0.0116, hard: 0.0115
@Hesene
As I noted above, by default we use 4 GPUs, batchsize=8 and iter=40000 for training. If you use smaller GPUs/batch size training, you can consider reducing the learning rate (e.g. 0.005) and increasing the number of iterations (e.g. 100000 for single card) in training.
To get more stable results, it is recommended to download the ResNet pre-trained model provided by Ruotian Luo in Google Drive and set conf.use_rcnn_pretrain = True
.
Thanks
Feel free to reopen it if you have any further questions.
@Hesene
- As I noted above, by default we use 4 GPUs, batchsize=8 and iter=40000 for training. If you use smaller GPUs/batch size training, you can consider reducing the learning rate (e.g. 0.005) and increasing the number of iterations (e.g. 100000 for single card) in training.
- To get more stable results, it is recommended to download the ResNet pre-trained model provided by Ruotian Luo in Google Drive and set
conf.use_rcnn_pretrain = True
.Thanks
I will try it, Thanks for your sharing
Feel free to reopen it if you have any further questions.
@dingmyu Hi, in the link https://drive.google.com/drive/folders/0B7fNdx_jAqhtNE10TDZDbFRuU0E, it didn't have res50_faster_rcnn_iter_1190000.pth or faster_rcnn_1_10_14657.pth, and which model should we download. Thank you for your sharing
@Hesene In your link/res50/converted_from_tf/coco_900k_1190K.rar, unzip it and u will see res50_faster_rcnn_iter_1190000.pth
@Hesene In your link/res50/converted_from_tf/coco_900k_1190K.rar, unzip it and u will see res50_faster_rcnn_iter_1190000.pth
@dingmyu Hi I unzio it and get coco_900k_1190K file, not a '.pth' file and it can't load.
@Hesene Hi, try to rename it into .zip or .tar.gz and then unzip it?
I can see the iter_119000 model in this link.
@Hesene In your link/res50/converted_from_tf/coco_900k_1190K.rar, unzip it and u will see res50_faster_rcnn_iter_1190000.pth
@dingmyu Hi I unzio it and get coco_900k_1190K file, not a '.pth' file and it can't load.
@Hesene Hi, try to rename it into .zip or .tar.gz and then unzip it?
I can see the iter_119000 model in this link.
Thanks a lot, I get it ,Thank you for your answer again
Dear authors,
thank you very much for your work. I would like to ask you a few questions.
First, when I evaluate your provided network, I get the following results:
OLD_test_iter pretrain 2d car --> easy: 0.9277, mod: 0.8439, hard: 0.6785
NEW_test_iter pretrain 2d car --> easy: 0.9342, mod: 0.8377, hard: 0.6742
OLD_test_iter pretrain gr car --> easy: 0.3349, mod: 0.2507, hard: 0.1983
NEW_test_iter pretrain gr car --> easy: 0.3225, mod: 0.2268, hard: 0.1722
OLD_test_iter pretrain 3d car --> easy: 0.2490, mod: 0.2077, hard: 0.1729
NEW_test_iter pretrain 3d car --> easy: 0.2317, mod: 0.1621, hard: 0.1234
OLD_test_iter pretrain 2d pedestrian --> easy: 0.6618, mod: 0.5812, hard: 0.4975
NEW_test_iter pretrain 2d pedestrian --> easy: 0.6896, mod: 0.5670, hard: 0.4756
OLD_test_iter pretrain gr pedestrian --> easy: 0.0628, mod: 0.0512, hard: 0.0483
NEW_test_iter pretrain gr pedestrian --> easy: 0.0471, mod: 0.0391, hard: 0.0321
OLD_test_iter pretrain 3d pedestrian --> easy: 0.0436, mod: 0.0445, hard: 0.0396
NEW_test_iter pretrain 3d pedestrian --> easy: 0.0371, mod: 0.0293, hard: 0.0270
OLD_test_iter pretrain 2d cyclist --> easy: 0.6234, mod: 0.4608, hard: 0.3972
NEW_test_iter pretrain 2d cyclist --> easy: 0.6301, mod: 0.4180, hard: 0.3816
OLD_test_iter pretrain gr cyclist --> easy: 0.0344, mod: 0.0296, hard: 0.0306
NEW_test_iter pretrain gr cyclist --> easy: 0.0295, mod: 0.0168, hard: 0.0168
OLD_test_iter pretrain 3d cyclist --> easy: 0.0293, mod: 0.0270, hard: 0.0262
NEW_test_iter pretrain 3d cyclist --> easy: 0.0263, mod: 0.0149, hard: 0.0148
These are OK results but are not the same that you provide in your paper. I mean these results:
Also, when I run train.sh
, I get similar results to the results that I get using the provided model, but these results are still not the same as in the paper. In fact, it is significantly better for the pedestrian class and better for the cyclist class.
OLD_test_iter 40000 2d car --> easy: 0.8290, mod: 0.7506, hard: 0.5892
NEW_test_iter 40000 2d car --> easy: 0.8759, mod: 0.7708, hard: 0.6137
OLD_test_iter 40000 gr car --> easy: 0.3448, mod: 0.2528, hard: 0.2053
NEW_test_iter 40000 gr car --> easy: 0.3066, mod: 0.2115, hard: 0.1653
OLD_test_iter 40000 3d car --> easy: 0.2671, mod: 0.1953, hard: 0.1754
NEW_test_iter 40000 3d car --> easy: 0.2230, mod: 0.1503, hard: 0.1193
OLD_test_iter 40000 2d pedestrian --> easy: 0.5670, mod: 0.4883, hard: 0.4096
NEW_test_iter 40000 2d pedestrian --> easy: 0.5822, mod: 0.4813, hard: 0.3946
OLD_test_iter 40000 gr pedestrian --> easy: 0.1323, mod: 0.1156, hard: 0.1137
NEW_test_iter 40000 gr pedestrian --> easy: 0.0528, mod: 0.0424, hard: 0.0351
OLD_test_iter 40000 3d pedestrian --> easy: 0.0473, mod: 0.0482, hard: 0.0413
NEW_test_iter 40000 3d pedestrian --> easy: 0.0405, mod: 0.0314, hard: 0.0287
OLD_test_iter 40000 2d cyclist --> easy: 0.4861, mod: 0.3255, hard: 0.3241
NEW_test_iter 40000 2d cyclist --> easy: 0.4460, mod: 0.2657, hard: 0.2633
OLD_test_iter 40000 gr cyclist --> easy: 0.1132, mod: 0.1058, hard: 0.1064
NEW_test_iter 40000 gr cyclist --> easy: 0.0375, mod: 0.0242, hard: 0.0238
OLD_test_iter 40000 3d cyclist --> easy: 0.1070, mod: 0.0909, hard: 0.0909
NEW_test_iter 40000 3d cyclist --> easy: 0.0213, mod: 0.0141, hard: 0.0144
Can you please tell me how can I obtain the same results as in the paper? Thank you!
When I use your simplified version to train, it produced a bad performance