Haiyang-W / DSVT

[CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"
https://arxiv.org/abs/2301.06051
Apache License 2.0
373 stars 28 forks source link

Deploy Problem #40

Closed sylcito closed 1 year ago

sylcito commented 1 year ago

Hi, thanks your great work! Based on my understanding, in the deployment of DSVT, you converted the Transformer network part of DSVT into DSVT_TrtEngine, while DSVT_Input_Layer still uses original PyTorch code. I would like to ask if DSVT_Input_Layer can also be converted into ONNX-TRT? Because there are operators such as torch.sort and torch.unique in it that are not supported by TRT, I plan to convert DSVT_Input_Layer into a single CUDA kernel when deploying the entire model. Do you have any suggestions for a faster and more convenient approach? Wish your reply.

Haiyang-W commented 1 year ago

Good question, this is the future direction of optimization. The input layer cannot be deployed, because it is not a large matrix calculation of the network, but the consumption time is fixed. This step takes about 8ms and will not grow as the network deepens. If you want to speed up, you can cache the position embedding and write some cuda OP.