PointCloudLibrary / pcl

Point Cloud Library (PCL)
https://pointclouds.org/
Other
9.95k stars 4.62k forks source link

[conversions] toPCLPointCloud2 or toROSMsg<PointCloud2> wastes memory with alignment bytes #4939

Closed peci1 closed 8 months ago

peci1 commented 3 years ago

Describe the bug

When converting a PCL pointcloud to ROS PointCloud2, the resulting data format unconditionally keeps the memory alignment. That can have drastic consequences to the size of the sent data.

Context

Depending on the use-case, users might want choose to sacrifice computational performance for size of the data transmitted over network (which can be e.g. the case of 3D lidar output with many fields, few of which are later processed).

Expected behavior

I would expect the API of toROSMsg and toPCLPointCloud2 to have a parameter which would specify whether to keep the alignment or not.

Current Behavior

The alignment is always retained.

To Reproduce

Here is a sample struct of a point from the Ouster lidar:

struct EIGEN_ALIGN16 Point {
    PCL_ADD_POINT4D;
    float intensity;
    uint32_t t;
    uint16_t reflectivity;
    uint8_t ring;
    uint16_t ambient;
    uint32_t range;
    EIGEN_MAKE_ALIGNED_OPERATOR_NEW
};

POINT_CLOUD_REGISTER_POINT_STRUCT(ouster_ros::Point,
    (float, x, x)
    (float, y, y)
    (float, z, z)
    (float, intensity, intensity)
    (std::uint32_t, t, t)
    (std::uint16_t, reflectivity, reflectivity)
    (std::uint8_t, ring, ring)
    (std::uint16_t, ambient, ambient)
    (std::uint32_t, range, range)
)

When this structure gets serialized to sensor_msgs::PointCloud2, each point takes up 48 bytes, which can be seen by e.g. rostopic echo -n1 --noarr /os_cloud_node/points (Ouster driver internally calls toROSMsg: https://github.com/ouster-lidar/ouster_example/blob/47f25ed29ab0b4b6d32c3209fa1b53ea46751d0c/ouster_ros/src/ros.cpp#L93 )

header: 
  seq: 14
  stamp: 
    secs: 1630986591
    nsecs: 655321856
  frame_id: "os_sensor"
height: 128
width: 1024
fields: "<array type: sensor_msgs/PointField, length: 9>"
is_bigendian: False
point_step: 48
row_step: 49152
data: "<array type: uint8, length: 6291456>"
is_dense: True

You can see point_step is 48 bytes.

But the fields carry only 29 bytes of useful information. When sending these scans over a Gigabit network, the difference is 500 Mbps vs. 300 Mbps. And that's a huge difference.

Your Environment (please complete the following information):

Possible Solution

Add an optional parameter to the relevant functions. It could be named e.g. keep_alignment. It should default to true to retain backwards compatibility. The docs of the parameter should also mention that when keep_alignment is set to false, the conversion/copy operation may become costly, as opposed to the relatively cheap large memcpy used when transferring the contiguous chunk of aligned memory.

mvieth commented 3 years ago

Related issue: #1566 What you are describing sounds good. Feel free to create a pull request to add this functionality

peci1 commented 3 years ago

Thanks for the reference. With you saying there is a chance and will to accept the required change, I'll consider sending the PR, but no earlier than in October. I'm adding it to my todo list.

mvieth commented 2 years ago

@peci1 Are you still planning to implement this?

peci1 commented 2 years ago

Yes, but it's not top priority at this moment. Are there any release deadlines closing?

mvieth commented 2 years ago

Yes, but it's not top priority at this moment. Are there any release deadlines closing?

We just made a release in December (PCL 1.12.1), the next release (PCL 1.13.0) will probably be mid-2022

peci1 commented 2 years ago

Thanks. I think mid-2022 is doable ;)

kunaltyagi commented 2 years ago

@peci1 While this effort in PCL is fine, it would also make sense to change the layout in Ouster point format. The issue being:

    float x, y, z;
    float intensity;
    uint32_t t;
    uint16_t reflectivity;
    uint8_t ring;
    uint16_t ambient;
    uint32_t range;

This can be changed to:

    float x, y, z;
    float intensity;
    uint32_t t;
    uint32_t range;
    uint16_t reflectivity;
    uint16_t ambient;
    uint8_t ring;

This would also save some bytes. If you are on a typical system, it should result in a message of size: 32 bytes, not 48. While this isn't optimal, at original 500 Mbps, it would save 33% and use 333 Mbps.

Optimal would improve upon this by a little: (~9.5% of 333)