charlesq34 / frustum-pointnets

Frustum PointNets for 3D Object Detection from RGB-D Data
Apache License 2.0
1.59k stars 538 forks source link

Bird eye view #3

Closed bhack closed 6 years ago

bhack commented 6 years ago

Do you will release also the bird eye view experiment (section 5.3)?

charlesq34 commented 6 years ago

Hi @bhack

I have some uncleaned version of code for that. Shoot me an email and I can share those with you, if you think it helps.

If there are other interest in this data, i will try to organize and clean it up.

Best, Charles

srikanthmalla commented 6 years ago

@bhack, Check this out: https://github.com/qianguih/voxelnet/blob/4ede13972d5fc51bd8e8f9cc5ed4e8311842ee44/utils/utils.py#L353

mtamburrano commented 6 years ago

hi @charlesq34, I'm also interested to the bird eye's view code

chowkamlee81 commented 6 years ago

hi @charlesq34 , kindly release bird eye's view code. It will help us all.

kwea123 commented 6 years ago

I realized this idea. I can't release the full code here but here's the rough idea:

  1. Use this code to transform the point cloud into an image (in my case it's 400x400x3 experimentally).
  2. Transform the ground truth 3d boxes into 2d bounding boxes on this image. my code
  3. Use your favorite object detection model to train this 2d detection task. I used keras-retinanet and got 87.12 AP (cars only) on val set.
  4. Use these 2d bounding boxes as proposals, i.e. select points that lie in this xy-range in velodyne coordinate (you may need to enlarge a little this area, e.g. +-1m to guarantee that the whole car lies in it), then do segmentation on these cuboids. You need also to rotate these points according to the "frustum angle", which you can compute by -1*np.arctan2(xc, -yc) from the points' center. Despite the difference in shape with respect to the training data, which are frustums, the segmentation still works quite well.
  5. Combine the result with the camera proposals.

2D BEV detection works much well than I have imagined. Worth a try.

gujiaqivadin commented 5 years ago

I realized this idea. I can't release the full code here but here's the rough idea:

  1. Use this code to transform the point cloud into an image (in my case it's 400x400x3 experimentally).
  2. Transform the ground truth 3d boxes into 2d bounding boxes on this image. my code
  3. Use your favorite object detection model to train this 2d detection task. I used keras-retinanet and got 87.12 AP (cars only) on val set.
  4. Use these 2d bounding boxes as proposals, i.e. select points that lie in this xy-range in velodyne coordinate (you may need to enlarge a little this area, e.g. +-1m to guarantee that the whole car lies in it), then do segmentation on these cuboids. You need also to rotate these points according to the "frustum angle", which you can compute by -1*np.arctan2(xc, -yc) from the points' center. Despite the difference in shape with respect to the training data, which are frustums, the segmentation still works quite well.
  5. Combine the result with the camera proposals.

2D BEV detection works much well than I have imagined. Worth a try.

I am sorry, Your realization has some difference to the original paper? The original input size is 600x600x7. Can you explain sth to me~

kwea123 commented 5 years ago

It depends on your implementation of converting point cloud to image. I personally use https://github.com/leeyevi/MV3D_TF/blob/master/tools/read_lidar.py implementation, which allows you to decide which range and which resolution to convert.

I use 40m front and left+right +-20m points with 0.1m/pixel resolution, so it makes a 400x400 image; regarding the height, I only use -2~0m with 1m/pixel resolution, so it turns out to be 3 channels. You can change these numbers according to your needs, but personally I find that the height resolution doesn't make much difference in the detection AP.