Zengyi-Qin / MonoGRNet

MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Detection and Localization | KITTI
Apache License 2.0
245 stars 48 forks source link

How to train the model well #15

Open kl456123 opened 5 years ago

kl456123 commented 5 years ago

Thanks for your great work first ! I just want to reproduce it using pytorch framework quickly(modify from two stage detector directly) . But I find that it is something diffcult for me to train model well. Note that I do train 2d first . I just only get loss from pred_locations_proj, its depth, and local corners. Can you tell me something important to take care of to train the model better ?

BTW, I cannot find the code to implements Ocamk = ROk + C in your project, Is it important for the performance ?

Zengyi-Qin commented 5 years ago

Thank you for your interest! If you'r reproducing MonoGRNet based on a two-stage detector, there're some suggestions:

As for the Ocam=R*Ok+C, please refer to https://github.com/Zengyi-Qin/MonoGRNet/blob/97b6f9308e24d010713fb45e4e5ca57adf7e409c/decoder/fastBox.py#L611 for the + operation, and https://github.com/Zengyi-Qin/MonoGRNet/blob/97b6f9308e24d010713fb45e4e5ca57adf7e409c/include/utils/train_utils.py#L200 for the multiplication of Ok, which is not implemented before + operation but at the very end of inference (writing the results). In this line, we change from object coordinates to camera coordinates.

kl456123 commented 5 years ago

Thanks for your help. BTW I don't know what's means for np.pi * 0.5 BBTW maybe ROI Ops is a big problem for monocular 3d prediction

yxy1995123 commented 4 years ago

Thank you for your interest! If you'r reproducing MonoGRNet based on a two-stage detector, there're some suggestions:

  • Please don't modify the RPN (first) stage, which is just for 2D detection.
  • In the second stage, normally there're 2 branches for 2D box regression & classification. Now I suggest you to add 3 more branches to predict the 3D central depth (IDE mentioned in the paper, "z" coordiicate in KITTI), projected 3D center, and the local corners (3*8=24 values).
  • You may do ROI pooling/align from different conv layers to ensure the features have enough capacity to learn the abovementioned additional 3 branches.
  • Please blance the loss magnitude for the total 5 branches. You may print the loss separately and find 5 fixed weights to balance them

As for the Ocam=R*Ok+C, please refer to

https://github.com/Zengyi-Qin/MonoGRNet/blob/97b6f9308e24d010713fb45e4e5ca57adf7e409c/decoder/fastBox.py#L611

for the + operation, and https://github.com/Zengyi-Qin/MonoGRNet/blob/97b6f9308e24d010713fb45e4e5ca57adf7e409c/include/utils/train_utils.py#L200

for the multiplication of Ok, which is not implemented before + operation but at the very end of inference (writing the results). In this line, we change from object coordinates to camera coordinates.

I want to know how to get R matrix or could you tell me where multiplication of Ok.Thanks very much

Zengyi-Qin commented 4 years ago

To the comment above: thanks for your interest! I just wrote a new reply giving a detailed explaination about this.