DeyvidKochanov-TomTom / kprnet

MIT License
65 stars 12 forks source link

Using Non Velodyne LiDAR's range image with KPRNet #9

Open mvish7 opened 3 years ago

mvish7 commented 3 years ago

Hi, Please find some background of my question at the start.

I'm trying to reimplement KPRNet for a Non-Velodyne LiDAR dataset. Due to multiple factors (viz. Sensor type, sensor placement, etc), our dataset does not have a full 360° scan. Hence we do not get any projections at the start and end of the range image. Please find a sample range image below. range_img

So we do not get any projections in starting 150-200 and ending 400-500 columns.

Now coming to the question: to make the best use of KPRNet, should we crop the range image? i.e. keep only part of the range image where projections are available and remove the columns at the start and end??

My question is based on functionality from do_range_projection, kd_tree creation (both from dataloader), resample_grid, and kpconv layer (both from KPClassifier).

My thought process behind this question:

  1. In the do_range_projection function: the (sort of) pixel_ids of x and y coordinates of range image (px and py) are created and saved. From my understanding (please correct me if wrong) these px and py pixel_ids are sort of 2d coordinates of raw points.
  2. While Kd_tree creation raw lidar points are used to create kd_tree and find it's nearest neighbours.
  3. In resample_grid function: these px and py pixel_ids are used to create a grid and the feature map ( range image processed by CNN) is used to fill the values in this grid.
  4. lastly, kpconv layers raw points, their neighbours and features around the raw points for point convolution.

So from this thought process: I believe if we don't crop the range image then we are involving many harmful pixels (from range image where no data is available) in calculations such as 3D-2D projection, K-NN, Forward pass through CNN, 2D-3D projection and KPConv. I feel somehow this is not ideal for the network.

Could you please advise if we should crop the range image or not?? if yes, then could you please suggest what all things should be considered while cropping the range image??

Thanks

DeyvidKochanov-TomTom commented 3 years ago

In the do_range_projection function: the (sort of) pixel_ids of x and y coordinates of range image (px and py) are created and saved. From my understanding (please correct me if wrong) these px and py pixel_ids are sort of 2d coordinates of raw points. In resample_grid function: these px and py pixel_ids are used to create a grid and the feature map ( range image processed by CNN) is used to fill the values in this grid.

Indeed px py are the 2d coordinates of the 3d points. When you project you should pay attention that these are in the same order as the original 3D points. The resample grid function extracts the features from the CNN using bilinear interpolation. Its Nx1xC where N is the number of original 3D points and C is the number of feature channels from the CNN. for the resample grid you should make sure px and py are first transformed to be between -1 and 1 it is the coordinate system that F.grid_sample uses.

While Kd_tree creation raw lidar points are used to create kd_tree and find it's nearest neighbours. lastly, kpconv layers raw points, their neighbours and features around the raw points for point convolution.

Yes, the kpconv layer requires the indecies of the K nearest neighbors for each point so we use the kd tree to compute that. The kpconv layer takes takes the 3d points, knn indecies and corresponding CNN features which were extracted by resample grid and makes the final predictions. Actually during training we also pad the 3D points and the KNN indecies to make the number of points in every batch sample the same. This is done in a bit hacky way here

So from this thought process: I believe if we don't crop the range image then we are involving many harmful pixels (from range image where no data is available) in calculations such as 3D-2D projection, K-NN, Forward pass through CNN, 2D-3D projection and KPConv. I feel somehow this is not ideal for the network.

Your extra range projection pixels would only be seen by the CNN part and discarded later in the grid_resample function. It shouldn't be a problem for the CNN but you could save some computation by removing them. You can do so, in the projection. You will then have to adjust the hardcoded values in the dataloader Also the train_transform function already does some random cropping during training.

huixiancheng commented 3 years ago

Hi!Dear @DeyvidKochanov-TomTom . I'm back again. Why the value in here is 38000?I know it's to make the number of points in every batch sample the same.But why is 38000? I think it's about 1/4 of the total numbers of all points,sice we random cropping 1/4 of the range image.Is this right?

DeyvidKochanov-TomTom commented 3 years ago

Hi, yes the random crop should have something like 33k points but we just pick a number that is slightly bigger :smile:

huixiancheng commented 3 years ago

OK!:ok_hand: