WeijingShi / Point-GNN

Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud, CVPR 2020.
MIT License
523 stars 114 forks source link

Some questions about the paper? #20

Closed swzaaaaaaa closed 4 years ago

swzaaaaaaa commented 4 years ago

Author, hello, Recently I read your paper, there are some points do not understand, I hope you to answer, thank you! The questions are as follows: 1.As the author says, Point-GNN mainly includes diagram construction, iterative GNN, and bounding box merging and scoring.In the build section of the diagram,a point cloud comprises tens of thousand of points.So voxel subsampling is needed, and at this point, I want to ask, is it better to build the graph first and then build the graph after voxel subsampling, or is it better to build the graph after voxel subsampling? 2.In GNN network iteration, what are the blue and yellow cubes before and after MLP in the schematic? 3.In the GNN network iteration, three MLPS should be required according to the iteration formula, and only two MLPS were seen on the way? 4.When you have done the Merging and Scoring in Box, you have considered the reasons for partially occluding. However, can there be occlusion in the cloud of 3D points? 5.For training purposes, you delete samples that do not contain objects of interest.How does this happen? 6.In the paper, the prediction will show a 3D detection box on the picture, and when I predict in the code, it will show a 2D detection box on the picture, right? Above is what I do not understand, here, thanks for the author's answer.

WeijingShi commented 4 years ago

Thanks for your questions.

  1. I am not sure what do you mean by "is it better to build the graph first and then build the graph after voxel subsampling". Can you elaborate on that?
  2. Those represent 1D feature vectors. Each point has its feature vector and coordinates. The feature vector is iteratively refined.
  3. You are correct. It's an oversight in the early arxiv version. We have fixed in the published version.
  4. It depends on how the point cloud is generated. In the self-driving vehicle context, the points are typically generated by LiDAR which is on the vehicle. The LiDAR emits laser to measure the position of the surrounding obstacles. An object can lie behind another object, therefore, be occluded.
  5. When a point is not from an object of interest, we do not compute its bounding box. When a sample contains no object of interest, the localization output of the network has no supervision. Therefore, we remove those samples.
  6. Yes, run.py draws the simple 2D boxes on the image. If 3D boxes are desired, vis_draw_3d_box might be helpful.

Thanks,

swzaaaaaaa commented 4 years ago

Thank you very much for your answer.I understand a lot. And one of the things that I have a problem with is the first question.It should be:

  1. Was the voxel subsampling completed before the construction of the diagram? I did not see it in the diagram of the paper?
  2. "In order to save the information (due to the use of voxel undersampling) in the original point cloud, the dense point cloud is encoded as the initial state value Si of the vertex", is this process done after the diagram is constructed?
  3. If the subvoxel sampling is completed before the diagram is constructed, how can the previous information be saved in the original point cloud? To sum up is about the use of voxel sampling location and use of the role! These places I do not understand, may be some of the expression is not clear! Thanks again for your careful answer!
WeijingShi commented 4 years ago

Given a point cloud P, the graph is constructed as:

  1. Downsample P to a smaller manageable set S, which is a subset of P
  2. For every point in S, find its radius-neighbor points in P and exact those neighbors' features as the initial feature vector of the point in S. In this way, although S is a subset of the original point cloud, the point in S contains features of the original P (a correct radius needs to be set so that no point is missed out).
  3. Connect points in S to their radius-neighbors in S to create the graph.
  4. Go through GNN layers to get feature vectors refined and detect the object using the feature vector.

I hope this helps.

swzaaaaaaa commented 4 years ago

I see. Thank you very much.