SysCV / shift-dev

SHIFT Dataset DevKit - CVPR2022
https://www.vis.xyz/shift
MIT License
103 stars 10 forks source link

Open LiDAR Point Clouds #36

Closed Cram3r95 closed 1 year ago

Cram3r95 commented 1 year ago

Hi everyone,

Thanks for your amazing work. I would like to train OpenPCDet models on your synthetic data. Therefore, I need to convert your LiDAR point cloud to .bin files (to adapt it to KITTI format).

How could I get started on the use of your point clouds? Right now, I have decompressed lidar.zip into a hdf5 file but I cannot extract ply files from there. How could I read those?

Thanks in advance, Carlos

suniique commented 1 year ago

Hey @Cram3r95, first, thanks for your interest in using SHIFT!

How do you get the LiDAR point clouds? On our website and download script, the data should be a zip file of all the PLY point clouds. You can extract (or part of) the zip to obtain the ply files directly. PLY is a standard 3D point cloud format supported in many 3D softwares, like CloudCompare, MeshLab, or Blender.

In Python, I suggest using open3d to open the point cloud files. I've included below a code snippet that may be helpful for you. For the KITTI's bin format, I don't know its exact definition, but as suggested by other users, you can try some things as follows.

import numpy as np
import open3d as o3d

pcd = o3d.io.read_point_cloud('xxxx-xxxx.ply')
arr = np.asarray(pcd.points)
# arr: np.array of shape (n, 4), with each row of [x, y, z, intensity]

# Converting to KITTI bin, suggested in https://github.com/PRBonn/lidar-bonnetal/issues/78.
# I have yet to run it personally.
arr_flatten = np.zeros(arr.shape[0] * 4, dtype=np.float32)
arr_flatten[0::4] = arr[:, 0]
arr_flatten[1::4] = arr[:, 1]
arr_flatten[2::4] = arr[:, 2]
arr_flatten[3::4] = arr[:, 3]
arr_flatten.astype('float32').tofile('xxxx-xxxx.bin')

Please tell me if you're still confused about the LiDAR data! I will be more than happy to help.

Cram3r95 commented 1 year ago

Hi @suniique

Thank you for your quick answer!! My mate has this file in an hdf5 format, with the following keys:

image

each string represents a sequence of frames

  1. Each sequence of frames:

image

Is a "Dataset"-like, with 50 ply files

  1. Our problem is how to read this PLY file in order to:

But the following code is not working:

image

Do you know why? Nevertheless, I can try to obtain directly the PLY files instead of obtaining an intermediate hdf5 file.

suniique commented 1 year ago

@Cram3r95 Aha, I get your problem. The issue is about the open3d library, which only supports read point clouds from the file system (a.k.a, raw files), not a Python I/O buffer (like what you get from hdf5), as indicated in https://github.com/isl-org/Open3D/issues/1146.

Thus, I suggest either extracting the files locally before processing by open3d, or using another library like plyfile. For plyfile, you can have a look at the following codes,

import io
import plyfile

bytes = io.BytesIO(np.array(hdf5[name])) # create an IO buffer
plydata = plyfile.PlyData.read(bytes)    # parse point cloud from the buffer

num_points = plydata['vertex'].count
arr = np.zeros((num_points, 4), dtype=np.float32)
arr[:, 0] = plydata['vertex'].data['x']
arr[:, 1] = plydata['vertex'].data['y']
arr[:, 2] = plydata['vertex'].data['z']
arr[:, 3] = plydata['vertex'].data['intensity']

# ...

Again, if you need to process our data before training/testing, we don't recommend converting them into HDF5 before processing, which will complicate the loading. The HDF5 is only helpful if you have an I/O performance issue during training (because of too many files saved on disk).

Cram3r95 commented 1 year ago

Thank you so much @suniique, I really appreciate your answer, quite useful.

suniique commented 1 year ago

Ok, I close the issue now. If you have further questions, feel free to reopen it!

suniique commented 1 year ago

Hey @santimontiel, thanks for the question!

I used the same codes for our projects, and it seems to work well. Could you please try to check the keys to the ply file you get, e.g., print(plydata["vertex"].data.dtype)? On my side, it should print like

[('x', '<f4'), ('y', '<f4'), ('z', '<f4'), ('intensity', '<f4')]

Another idea is that perhaps you can try to replace np.array with bytearray in the code to create the IO buffer. bytearray is a Python built-in function, which may be more compatible across platforms.