PRBonn / point-cloud-prediction

Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks
https://www.ipb.uni-bonn.de/wp-content/papercite-data/pdf/mersch2021corl.pdf
MIT License
141 stars 25 forks source link

got an error when using my own dataset #9

Closed xutianq999 closed 1 year ago

xutianq999 commented 1 year ago

Hi Benedikt Thank you so much for this work, I have achieved training and prediction with the help of your documentation But I encountered difficulties when I wanted to use it on non-lidar point cloud data. This data comes from my depth camera [Bumblebeexb3]. The following is a sketch of it opened in Cloudcompare and in txt format. 1672472465784 754fd287d70b0d752352c6f9202ae97 The data contained in it is x y z intensity.It may not be obvious but it is a landslide of an indoor experiment, and I hope to predict the landslide through this work. My problem is that he reported this error when I converted the txt file into a .bin file through the code. 1672472611345 I use numpy to convert the data point_cloud = numpy.loadtxt("try.txt") a.tofile("try.bin")

When I use Cloudcompare to convert the txt file into .pcd and open it to add the intensity header file and then convert .pcd into .bin through another script, I can run through the above preprocessing link, but Dataloader does not read the data df9a5e2d5a1b38cfd280ab1580fed8c 8371914b560bc8dd589584bdf37b43f You can also see that the calculated averages are all 0. Could you please give me some suggestion or solution?

benemer commented 1 year ago

Hey @xutianq999!

The error you get during preprocessing is because we assume to have 3D points in the local frame of a rotating laser scanner with the field of view defined in parameters.yaml. During preprocessing, the point cloud is represented as a range image using a spherical projection.

You could try to process the disparity image of your stereo camera instead of the resulting 3D points. Given the baseline and the focal length of your setup, you can compute the depth values resulting in a 2D depth image. It might be easier to write your own dataloader. Just make sure it returns past and future images and the required meta data. The images should be torch.tensor objects of size (n_channels, n_steps, height, width), where the channel could be the disparity/depth in your case.

Also, you should disable the CIRCULAR_PADDING since this only holds for images with 360 degrees horizontal field of view.

Best Benedikt

xutianq999 commented 1 year ago

Thank you very much. I have been troubled by this problem for a long time and tried many ways. I did not expect you to reply so quickly. Once again, I would like to express my high respect to you. Regarding the suggestion you mentioned, when I was reading the code, I noticed the part that forms the range_iamge by 3D point projection. I also tried to change some properties in the yaml file but did not achieve good results. I noticed that you also mentioned in your paper that CIRCULAR_PADDING will improve the data of the mechanic lidar to a certain extent. Please forgive me and confirm to you again, I should rewrite Dataloader, not to read 3D point cloud files like .bin files, but to directly input depth maps, right? As the landslide data I provided above, can I convert it into a depth map, and then directly use such data for training?

Thank you again for your reply, you have given me great encouragement and courage

benemer commented 1 year ago

You are welcome! Yes, the circular pattern helps to propagate features from one side of the range image to the other one. This is important since the rotating LiDAR covers the whole 360 degrees in the azimuthal direction, which means that the left and right range image boundaries are actually connected. Since this does not hold for your depth image, you should disable the circular padding.

Regarding your problem: Yes, instead of reading and projecting 3D point clouds, you can directly load the depth maps in the dataloader. Just modify the __getitem__ as mentioned above. Also, the MIN_RANGE and MAX_RANGE in the parameters.yml are used as limits for the prediction. If you want to predict the depth, these should be the minimum and maximum depth you want to predict. You also need to change some parameters like the HEIGHT and WIDTH of your depth maps.

For now, I would train the model without using the Chamfer Distance loss and only use the range/depth L1 loss and the valid mask loss. If you want to use Chamfer Distance loss as well, you need to modify the get_target_mask_from_range_view method to get the 3D point cloud from your predicted depth map.

So, there are some modifications needed to make this work with depth maps, but it should be possible. I actually used this codebase and modified it for the Waymo Occupancy and Flow Prediction Challenge which worked quite well.

Best regards Benedikt

yzysmile commented 9 months ago

Hi Benedikt, I watched your communication process. I have the following thoughts, I don’t know if they are correct or not.

  1. If i input depth maps directly for traing, there is no need the process of projection and i do not need to calculate the depth value by the baseline and focal length because each depth map itself stores the depth value.
  2. Each ground truth mask has the same size as the depth map and the value of each pixel position is 1 because each pixel of the depth map corresponds to a three-dimensional point.
  3. Depth images can be convert to cloud points by the camera intrisinc, so I still calculate Chamfer Distance loss easily.

Looking forward to your reply.

best wishes yzy

benemer commented 9 months ago

Hey,

  1. Yes, indeed. You just have to modify the dataloader to read your depth maps. I did not try to train it on depth maps, but I don't see a reason why it should not work.
  2. If you have pixels without/with invalid depth information, the mask should be zero, and therefore ignore these. This happens for example for the sky.
  3. If you compute the 3D point, this should be possible. Again, you have to modify the code for this.

Best Benedikt

yzysmile commented 9 months ago

Hey,

  1. Yes, indeed. You just have to modify the dataloader to read your depth maps. I did not try to train it on depth maps, but I don't see a reason why it should not work.
  2. If you have pixels without/with invalid depth information, the mask should be zero, and therefore ignore these. This happens for example for the sky.
  3. If you compute the 3D point, this should be possible. Again, you have to modify the code for this.

Best Benedikt

Thank you for your quick replies one by one. I will try to modify the code and wish you a Merry Christmas and good scientific research!

Best yzy