Open kl456123 opened 5 years ago
Thank you for your interest! If you'r reproducing MonoGRNet based on a two-stage detector, there're some suggestions:
Please don't modify the RPN (first) stage, which is just for 2D detection.
In the second stage, normally there're 2 branches for 2D box regression & classification. Now I suggest you to add 3 more branches to predict the 3D central depth (IDE mentioned in the paper, "z" coordiicate in KITTI), projected 3D center, and the local corners (3*8=24 values).
You may do ROI pooling/align from different conv layers to ensure the features have enough capacity to learn the abovementioned additional 3 branches.
Please blance the loss magnitude for the total 5 branches. You may print the loss separately and find 5 fixed weights to balance them
As for the Ocam=R*Ok+C, please refer to https://github.com/Zengyi-Qin/MonoGRNet/blob/97b6f9308e24d010713fb45e4e5ca57adf7e409c/decoder/fastBox.py#L611 for the +
operation, and https://github.com/Zengyi-Qin/MonoGRNet/blob/97b6f9308e24d010713fb45e4e5ca57adf7e409c/include/utils/train_utils.py#L200 for the multiplication of Ok, which is not implemented before +
operation but at the very end of inference (writing the results). In this line, we change from object coordinates to camera coordinates.
Thanks for your help. BTW I don't know what's means for np.pi * 0.5 BBTW maybe ROI Ops is a big problem for monocular 3d prediction
Thank you for your interest! If you'r reproducing MonoGRNet based on a two-stage detector, there're some suggestions:
- Please don't modify the RPN (first) stage, which is just for 2D detection.
- In the second stage, normally there're 2 branches for 2D box regression & classification. Now I suggest you to add 3 more branches to predict the 3D central depth (IDE mentioned in the paper, "z" coordiicate in KITTI), projected 3D center, and the local corners (3*8=24 values).
- You may do ROI pooling/align from different conv layers to ensure the features have enough capacity to learn the abovementioned additional 3 branches.
- Please blance the loss magnitude for the total 5 branches. You may print the loss separately and find 5 fixed weights to balance them
As for the Ocam=R*Ok+C, please refer to
for the
+
operation, and https://github.com/Zengyi-Qin/MonoGRNet/blob/97b6f9308e24d010713fb45e4e5ca57adf7e409c/include/utils/train_utils.py#L200for the multiplication of Ok, which is not implemented before
+
operation but at the very end of inference (writing the results). In this line, we change from object coordinates to camera coordinates.
I want to know how to get R matrix or could you tell me where multiplication of Ok.Thanks very much
To the comment above: thanks for your interest! I just wrote a new reply giving a detailed explaination about this.
Thanks for your great work first ! I just want to reproduce it using pytorch framework quickly(modify from two stage detector directly) . But I find that it is something diffcult for me to train model well. Note that I do train 2d first . I just only get loss from pred_locations_proj, its depth, and local corners. Can you tell me something important to take care of to train the model better ?
BTW, I cannot find the code to implements Ocamk = ROk + C in your project, Is it important for the performance ?