[pcl::normal] Why pcl::normal is designed to use 32 bytes？

QiuYilin commented 1 month ago

pcl::normal contains four float data: nx, ny, nz and curve, but uses two float[4] for alignment.

  struct EIGEN_ALIGN16 _Normal
  {
    PCL_ADD_NORMAL4D // This adds the member normal[3] which can also be accessed using the point (which is float[4])
    union
    {
      struct
      {
        float curvature;
      };
      float data_c[4];
    };
    PCL_MAKE_ALIGNED_OPERATOR_NEW
  };

I noticed that the gpu module also tends to use float[4] to represent normal, which leads to a very strange calling form：https://github.com/PointCloudLibrary/pcl/issues/2419. （Although it is also possible to use pcl::gpu::Feature::NormalType instead of pcl::PointXYZ）

namespace pcl
{
    namespace gpu
    {
        ////////////////////////////////////////////////////////////////////////////////////////////  
        /** \brief @b //////////////////////////////////////////////////////////////////////  
        /** \brief @b Feature represents the base feature class.  */

        struct PCL_EXPORTS Feature
        {
        public:
            using PointType = PointXYZ;
            using NormalType = PointXYZ;

            using PointCloud = DeviceArray<PointType>;
            using Normals = DeviceArray<NormalType>;
            using Indices = DeviceArray<int>;Feature represents the base feature class.  */

        struct PCL_EXPORTS Feature
        {
        public:
            using PointType = PointXYZ;
            using NormalType = PointXYZ;

            using PointCloud = DeviceArray<PointType>;
            using Normals = DeviceArray<NormalType>;
            using Indices = DeviceArray<int>;

What are the considerations for doing this?

.

mvieth commented 1 month ago

I believe https://pcl.readthedocs.io/projects/tutorials/en/master/adding_custom_ptype.html should answer your question

QiuYilin commented 1 month ago

In an ideal world, these 4 components would create a single structure, SSE-aligned. However, because the majority of point operations will either set the last component of the data[4] array (from the xyz union) to 0 or 1 (for transformations), we cannot make intensity a member of the same structure, as its contents will be overwritten. For example, a dot product between two points will set their 4th component to 0, otherwise the dot product doesn’t make sense, etc.

Therefore for SSE-alignment, we pad intensity with 3 extra floats. Inefficient in terms of storage, but good in terms of memory alignment.

Now I know why pcl::normal is designed to use 32 bytes. SIMD is a mechanism on the CPU, so it is not necessary to use it on the GPU, right?

mvieth commented 1 month ago

SIMD is a mechanism on the CPU, so it is not necessary to use it on the GPU, right?

That is a very general question, and I don't know enough about GPUs to answer this.

Please use Stackoverflow or the Discord community chat for such questions in the future. The GitHub issues are intended mainly for bug reports and compile errors.

PointCloudLibrary / pcl

[pcl::normal] Why pcl::normal is designed to use 32 bytes？ #6049