Haiyang-W / DSVT

[CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"
https://arxiv.org/abs/2301.06051
Apache License 2.0
353 stars 28 forks source link

DSVT-TRT deployment dynamic & static shaping #61

Closed d33dler closed 9 months ago

d33dler commented 9 months ago

Experimenting on an RTX 2060 and deploying the DSVT module only shows mediocre improvement versus python (~.05 % speedup) despite doing input statistic study (custom data) and narrowly adjusting the dynamic shapes where optShape is located very close to usual inputs and the min-maxShape bounds are also close. Moreover, I suspect the speed-up you observe is merely due to fp16 conversion and is very much hardware dependent, since as the trtexec debug log says : [09/19/2023-19:53:14] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: [09/19/2023-19:53:14] [W] [TRT] voxel_number [09/19/2023-19:53:14] [W] [TRT] set_number_shift_0 [09/19/2023-19:53:14] [W] [TRT] set_number_shift_1

Did you try static shaping? Assuming the pointcloud scene maintains almost the same shape (static recording), I assume - in order to obtain consistent values after voxelisation we need a "point" mask for the pointcloud input (points) to pad empty space across all borders to maintain the same number of voxels (correct me if i'm wrong). What would be the steps to achieve this?

chenshi3 commented 9 months ago

Hello! I've experimented with a static shape configuration, and found it to be faster than the dynamic setup.