Hi, thank you for your amazing work!
I have some question about the training process.
As far as I know, you are using 4 type of loss: 2D, 2D offset, 3D and 3D offset lost. As I understand x, y in predicted 3D is in pixel which are the same with 2D pixel right? Why do you have to seperate 2D output and 3D output?
And how can you create the offset ground truth? thank you again for your time.
Hi, thank you for your amazing work! I have some question about the training process. As far as I know, you are using 4 type of loss: 2D, 2D offset, 3D and 3D offset lost. As I understand x, y in predicted 3D is in pixel which are the same with 2D pixel right? Why do you have to seperate 2D output and 3D output? And how can you create the offset ground truth? thank you again for your time.