MIT-SPARK / Kimera-Semantics

Real-Time 3D Semantic Reconstruction from 2D data
BSD 2-Clause "Simplified" License
631 stars 134 forks source link

Using dynamic number of labels #56

Open RozDavid opened 3 years ago

RozDavid commented 3 years ago

Addressing my own issue #55, using dynamic sized semantic prior matrices and initialization in runtime. Reading number of labels from the launch file as a rosparam.

I believe the biggest concern could be the performance issue, for this here is a comparison of timings on the provided simulated semantic rosbag file.

With static labels


[ INFO] [1612796164.603515954, 60.342383269]: Layer memory: 16679159
[ INFO] [1612796164.603544030, 60.342383269]: Updating mesh.
[ INFO] [1612796164.620216339, 60.367825609]: Updating mesh.78.072170               
[ INFO] [1612796164.639555585, 60.414999999]: Integrating a pointcloud with 345600 points.
[ INFO] [1612796164.711103207, 60.544733472]: Finished integrating in 0.071499 seconds, have 339 blocks.
[ INFO] [1612796164.711173976, 60.544733472]: Timings: 
SM Timing
-----------
inserting_missed_blocks      59 00.000255   (00.000004 +- 00.000003)    [00.000001,00.000094]
integrate/fast               59 05.165177   (00.087545 +- 00.009905)    [00.065052,00.165955]
mesh/publish                108 00.286112   (00.002649 +- 00.003277)    [00.000005,00.009061]
mesh/update                 108 00.578068   (00.005352 +- 00.004863)    [00.000168,00.014176]
ptcloud_preprocess           59 00.590554   (00.010009 +- 00.004439)    [00.008309,00.034639]
remove_distant_blocks        59 00.000979   (00.000017 +- 00.000005)    [00.000006,00.000031]

With dynamic labels

[ INFO] [1612795707.691550318, 59.152784140]: Layer memory: 16629958
[ INFO] [1612795707.691564063, 59.152784140]: Updating mesh.
[ INFO] [1612795707.707433151, 59.172894621]: Updating mesh.78.072170               
[ INFO] [1612795707.728666358, 59.233487071]: Updating mesh.78.072170               
[ INFO] [1612795707.775409911, 59.323973636]: Integrating a pointcloud with 345600 points.
[ INFO] [1612795707.861115732, 59.483209653]: Finished integrating in 0.085651 seconds, have 339 blocks.
[ INFO] [1612795707.861209100, 59.483209653]: Timings: 
SM Timing
-----------
inserting_missed_blocks      54 00.000232   (00.000004 +- 00.000003)    [00.000001,00.000066]
integrate/fast               54 05.019893   (00.092961 +- 00.010746)    [00.070486,00.244003]
mesh/publish                 98 00.245623   (00.002506 +- 00.003167)    [00.000007,00.009175]
mesh/update                  98 00.499049   (00.005092 +- 00.005313)    [00.000144,00.018225]
ptcloud_preprocess           54 00.486605   (00.009011 +- 00.000462)    [00.008290,00.018598]
remove_distant_blocks        54 00.000844   (00.000016 +- 00.000005)    [00.000002,00.000034]
RozDavid commented 3 years ago

Hey @ToniRV,

Thanks a lot for the feedback and also for the ideas. I think it would be really nice to have this runtime operation, but you are also perfectly right this can't couse perfromance loss.

I am sadly not a pro here, but did a short research on Eigen Dynamic memory allocation and found this. Source.

Here is this constructor:

inline DenseStorage(int size, int rows, int) : m_data(internal::aligned_new<T>(size)), m_rows(rows) {}

Here, the m_data member is the actual array of coefficients of the matrix. As you see, it is dynamically allocated. Rather than calling new[] or malloc(), as you can see, we have our own internal::aligned_new defined in src/Core/util/Memory.h. What it does is that if vectorization is enabled, then it uses a platform-specific call to allocate a 128-bit-aligned array, as that is very useful for vectorization with both SSE2 and AltiVec. If vectorization is disabled, it amounts to the standard new[].

I believe if we don't change the directives EIGEN_DONT_ALIGN and EIGEN_MAX_ALIGN_BYTES (source) in the CMake build or with #defines, the memory allignment should be allright and the matrices will be automatically vectorized.

What do you think of this? Do you think if I replace the initialization with internal::aligned_new<T> that could bridge the performance gap?

ToniRV commented 3 years ago

@RozDavid I also understood that when re-reading Eigen, the operations should already be vectorized... So it might well be we can't improve on that. It's fine for me, I think this adds a lot of flexibility anyway. Another thing we should be careful is on the map size: having a 128-bit aligned array on a per-voxel basis may increase dramatically the size of the volumetric map. Maybe you could try to generate a map with and without this feature and see how many Mb is one with respect to the other? The rosservice save_map should just do that.

RozDavid commented 3 years ago

Hello @ToniRV,

I ran a few quick tests comparing the static and dynamic approaches with different number of labels for the probability matrices.

So when tested with static I recompiled the code with kTotalNumberOfLabels=128, changed the hardcoded prior to the apropriate number, and tested the same 21 + 128 setting with the dynamic sizes as well. Run the full rosbag, saved the layer in a vxblx file and copied the timings for the last pointcloud integration with the max. number of initialized blocks. The stats are copied to the end of this comment.

It was a bit surprising for me, that there is no difference between the layer sizes, only in the integration times. My interpretation is that Eigen allocates fixed sized memory up until a certain size of matrix (turns out this might be a bigger treshold than 128).

As all the vxblx file sizes are between 66.3 and 67.2 mb, the only difference here is the number of allocated blocks and about 15% performance loss in pointcloud integration.

Surely it depends on the use case if the flexibility is worth the performance loss or not, but I wanted to share this with you either way if you choose to merge or not.

The results can be compared here:

########## Dynamic 21 ##########

Vxblx file size: 67.1Mb

[ INFO] [1612890061.997756189, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612890062.089571026, 121.805000000]: Finished integrating in 0.091762 seconds, have 1207 blocks.
[ INFO] [1612890062.089766452, 121.805000000]: Timings: 
SM Timing
-----------
inserting_missed_blocks     285 00.001599   (00.000006 +- 00.000001)    [00.000001,00.000136]
integrate/fast              285 27.095124   (00.095071 +- 00.004953)    [00.058792,00.243477]
mesh/publish                399 01.489049   (00.003732 +- 00.002256)    [00.000006,00.013048]
mesh/update                 399 03.108085   (00.007790 +- 00.002853)    [00.000171,00.018809]
ptcloud_preprocess          285 02.596166   (00.009109 +- 00.000871)    [00.008302,00.036305]
remove_distant_blocks       285 00.019437   (00.000068 +- 00.000045)    [00.000004,00.000402]

[ INFO] [1612890062.089818329, 121.805000000]: Layer memory: 59385627
[ INFO] [1612890062.089849037, 121.805000000]: Updating mesh.

########## Dynamic 128 ##########

Vxblx file size: 66.3Mb

[ INFO] [1612889885.600096098, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612889886.008610556, 121.805000000]: Finished integrating in 0.408455 seconds, have 1205 blocks.
[ INFO] [1612889886.008808061, 121.805000000]: Timings: 
SM Timing
-----------
inserting_missed_blocks      83 00.001354   (00.000016 +- 00.000028)    [00.000001,00.000178]
integrate/fast               83 36.546054   (00.440314 +- 00.047001)    [00.274163,00.750914]
mesh/publish                 92 00.422405   (00.004591 +- 00.001759)    [00.000006,00.011521]
mesh/update                  92 00.844049   (00.009174 +- 00.001211)    [00.000178,00.014639]
ptcloud_preprocess           83 00.757355   (00.009125 +- 00.001122)    [00.008222,00.022541]
remove_distant_blocks        83 00.006077   (00.000073 +- 00.000031)    [00.000002,00.000203]

[ INFO] [1612889886.008843359, 121.805000000]: Layer memory: 59287225

########## Static 21 ##########

Vxblx file size: 67.2Mb

[ INFO] [1612890454.482768384, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612890454.564294501, 121.805000000]: Finished integrating in 0.081479 seconds, have 1209 blocks.
[ INFO] [1612890454.564449636, 121.805000000]: Timings: 
SM Timing
-----------
inserting_missed_blocks     290 00.001434   (00.000005 +- 00.000001)    [00.000001,00.000117]
integrate/fast              290 24.979322   (00.086136 +- 00.004859)    [00.053003,00.161756]
mesh/publish                493 01.521718   (00.003087 +- 00.003551)    [00.000007,00.008997]
mesh/update                 493 03.285606   (00.006665 +- 00.005909)    [00.000138,00.017277]
ptcloud_preprocess          290 02.654694   (00.009154 +- 00.002964)    [00.008272,00.034363]
remove_distant_blocks       290 00.018787   (00.000065 +- 00.000037)    [00.000002,00.000652]

[ INFO] [1612890454.564478470, 121.805000000]: Layer memory: 59484029

########## Static 128 ##########

Vxblx file size: 66.5Mb

[ INFO] [1612891013.389336555, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612891013.729743330, 121.805000000]: Finished integrating in 0.340356 seconds, have 1204 blocks.
[ INFO] [1612891013.729900746, 121.805000000]: Timings: 
SM Timing
-----------
inserting_missed_blocks      97 00.001169   (00.000012 +- 00.000019)    [00.000001,00.000123]
integrate/fast               97 35.617008   (00.367186 +- 00.036256)    [00.230930,00.561777]
mesh/publish                106 00.547046   (00.005161 +- 00.001862)    [00.000006,00.012017]
mesh/update                 106 01.092757   (00.010309 +- 00.001616)    [00.000182,00.018571]
ptcloud_preprocess           97 00.884389   (00.009117 +- 00.000556)    [00.008329,00.018017]
remove_distant_blocks        97 00.005814   (00.000060 +- 00.000037)    [00.000003,00.000304]

[ INFO] [1612891013.729936872, 121.805000000]: Layer memory: 59238024
[ INFO] [1612891013.729965223, 121.805000000]: Updating mesh.