PointCloudLibrary / pcl

Point Cloud Library (PCL)
https://pointclouds.org/
Other
9.96k stars 4.62k forks source link

[io] savePCDFileBinaryCompressed pcd file can't open with coredump error #4958

Open mickeyouyou opened 3 years ago

mickeyouyou commented 3 years ago

Describe the bug We use pcl::io::savePCDFileBinaryCompressed(label_pcd_Name, *cloud_out); to save our pcd file, and in most situation ,it works well. But in same output file(2 pcd files in 100k), it will meet coredump error like Loading /mnt/data/Lidar_B3_1632536240.453701.pcd [1] 24581 bus error (core dumped) , Even though we use different tools.

Context

What are you trying to accomplish? Providing context helps us come up with a solution that is most useful in the real world

pcd file header

# .PCD v0.7 - Point Cloud Data file format
VERSION 0.7
FIELDS x y z intensity timestamp ring
SIZE 4 4 4 1 8 2
TYPE F F F U F U
COUNT 1 1 1 1 1 1
WIDTH 317794
HEIGHT 1
VIEWPOINT 0 0 0 1 0 0 0
POINTS 317794
DATA binary_compressed

Although 317,794 point clouds are described in the header, the file size is actually only 2.2M, which is half smaller than the normal 4.5M.

pcl_viewer:

(base) ➜  build git:(dev) /usr/bin/pcl_viewer /mnt/data/Lidar_B3_1632536240.453701.pcd 
The viewer window provides interactive commands; for help, press 'h' or 'H' from within the window.
> Loading /mnt/data/Lidar_B3_1632536240.453701.pcd [1]    24581 bus error (core dumped)  /usr/bin/pcl_viewer /mnt/data/Lidar_B3_1632536240.453701.pcd

cloudcompare : 2021928-91222

gdb meesage :

(gdb) where
#0  0x00007ffff6e8311a in pcl::lzfDecompress(void const*, unsigned int, void*, unsigned int) () from /usr/lib/x86_64-linux-gnu/libpcl_io.so.1.8
#1  0x00007ffff6e40e7e in pcl::PCDReader::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, pcl::PCLPointCloud2&, Eigen::Matrix<float, 4, 1, 0, 4, 1>&, Eigen::Quaternion<float, 0>&, int&, int) () from /usr/lib/x86_64-linux-gnu/libpcl_io.so.1.8
#2  0x0000555555572cbf in main ()

Expected behavior

A clear and concise description of what you expected to happen. read pcd and points ok

Current Behavior

What happens instead of the expected behavior? pcd file can't read with coredump

To Reproduce

Provide a link to a live example, or an unambiguous set of steps to reproduce this bug. A reproducible example helps to provide faster answers.

Screenshots/Code snippets

In order to help explain your problem, please consider adding

          pcl::io::savePCDFileBinaryCompressed(fileName, *cloud_out);
          std::cout << "save frame: " << frameItem << " to  pcd: " << fileName
                    << std::endl;

Your Environment (please complete the following information):

Possible Solution

Not obligatory, but suggest a fix/reason for the bug. Feel free to create a PR if you feel comfortable.

Additional context

Add any other context about the problem here.

origin pcd file is here Lidar_B3_1632536240.453701.zip

normal_and_coredump_2pcd.zip

mvieth commented 3 years ago

BTW: starting with PCL 1.9.0, you get a nice error message instead of just a coredump when trying to read a corrupted file.

mvieth commented 3 years ago

Are you absolutely sure that you use PCL 1.9 to write the files? I found this pull request: https://github.com/PointCloudLibrary/pcl/pull/2325 The description there (and in the linked issue) sounds just like what you experience. If possible, please also try if the problem also occurs with PCL 1.12.0.

mickeyouyou commented 3 years ago
  • Do you only use Ubuntu (both for generating and reading the files)?
  • Is it possible that there was not enough disk space while writing the PCD files? Or could there be some other hardware limitation like too low write speed?
  • I noticed that the bad file contains (supposedly) more points than the good file (317794 vs 317253). Judging from your other good and bad files, is it a pattern that the bad files contain more points than the good files?
  • Can you reproduce the bad files in some way, or is it completely random whether a written file is bad or good?

BTW: starting with PCL 1.9.0, you get a nice error message instead of just a coredump when trying to read a corrupted file.

mvieth commented 3 years ago

I checked the code that writes the binary compressed PCD files, but it looks okay. For every IO operation, there is a check whether the operation succeeded, and it prints an error or throws an exception if it doesn't succeed, but if I understood you correctly, there are no errors while writing the PCD files, right? I noticed that the bad/corrupted PCD file is exactly 2097152 bytes large, which is 2^21. My best guess is that something goes wrong with the AWS object storage, that the first chunk(s) of the file get stored correctly, but the chunks after that somehow go missing. So my suggestion is that you try writing the files to "normal" disk space instead of the AWS object storage, and see whether there are still bad/corrupted files.

mickeyouyou commented 3 years ago

Acctually object storage is prallell operation, we generate it in local file system(docker container) firstly and move it to storage one time way. So I don't think it is an problem of object storage.

mvieth commented 3 years ago

Can you verify then whether there are already bad/corrupted files directly after writing them to the local file system, or whether they only appear after moving them to the external storage? If you are not directly writing to AWS object storage as you previously said, are you still absolutely sure that there is enough free space where the files are written? You can also try to save your files as binary files (uncompressed) instead and check if there are any bad/corrupted files (e.g. files much smaller than the others).

mickeyouyou commented 3 years ago

thank you so much for you repply!

  1. I test in different way, I want to reproduce this issue to use pcl::io::savePCDFileBinaryCompressed 50 times for every frame and save every pcd file to local file system, that will produce 3000(frames)*50 = 15W pcd files, but I don't find any bad files.
  2. Yes, save to binary format is our second option, we test it works well, at last we reproduced these bad frames in binary compressed.
  3. This is a very difficult bug to reproduce, we generated 2.2W disperse pcds, just meet 1 or 2 bad pcd.