DSVT-TRT deployment dynamic & static shaping

Experimenting on an RTX 2060 and deploying the DSVT module only shows mediocre improvement versus python (~.05 % speedup) despite doing input statistic study (custom data) and narrowly adjusting the dynamic shapes where optShape is located very close to usual inputs and the min-maxShape bounds are also close. Moreover, I suspect the speed-up you observe is merely due to fp16 conversion and is very much hardware dependent, since as the trtexec debug log says : [09/19/2023-19:53:14] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: [09/19/2023-19:53:14] [W] [TRT] voxel_number [09/19/2023-19:53:14] [W] [TRT] set_number_shift_0 [09/19/2023-19:53:14] [W] [TRT] set_number_shift_1

Did you try static shaping? Assuming the pointcloud scene maintains almost the same shape (static recording), I assume - in order to obtain consistent values after voxelisation we need a "point" mask for the pointcloud input (points) to pad empty space across all borders to maintain the same number of voxels (correct me if i'm wrong). What would be the steps to achieve this?

Haiyang-W / DSVT

DSVT-TRT deployment dynamic & static shaping #61