Closed bhack closed 6 years ago
Hi @bhack
I have some uncleaned version of code for that. Shoot me an email and I can share those with you, if you think it helps.
If there are other interest in this data, i will try to organize and clean it up.
Best, Charles
hi @charlesq34, I'm also interested to the bird eye's view code
hi @charlesq34 , kindly release bird eye's view code. It will help us all.
I realized this idea. I can't release the full code here but here's the rough idea:
2D BEV detection works much well than I have imagined. Worth a try.
I realized this idea. I can't release the full code here but here's the rough idea:
- Use this code to transform the point cloud into an image (in my case it's 400x400x3 experimentally).
- Transform the ground truth 3d boxes into 2d bounding boxes on this image. my code
- Use your favorite object detection model to train this 2d detection task. I used keras-retinanet and got 87.12 AP (cars only) on val set.
- Use these 2d bounding boxes as proposals, i.e. select points that lie in this xy-range in velodyne coordinate (you may need to enlarge a little this area, e.g. +-1m to guarantee that the whole car lies in it), then do segmentation on these cuboids. You need also to rotate these points according to the "frustum angle", which you can compute by -1*np.arctan2(xc, -yc) from the points' center. Despite the difference in shape with respect to the training data, which are frustums, the segmentation still works quite well.
- Combine the result with the camera proposals.
2D BEV detection works much well than I have imagined. Worth a try.
I am sorry, Your realization has some difference to the original paper? The original input size is 600x600x7. Can you explain sth to me~
It depends on your implementation of converting point cloud to image. I personally use https://github.com/leeyevi/MV3D_TF/blob/master/tools/read_lidar.py implementation, which allows you to decide which range and which resolution to convert.
I use 40m front and left+right +-20m points with 0.1m/pixel resolution, so it makes a 400x400 image; regarding the height, I only use -2~0m with 1m/pixel resolution, so it turns out to be 3 channels. You can change these numbers according to your needs, but personally I find that the height resolution doesn't make much difference in the detection AP.
Do you will release also the bird eye view experiment (section 5.3)?