模型转tensorrt加速，有大佬成功吗？

stevin-dong commented 7 months ago

我转Onnx后，再转tensorrt，出错： SubGraphCollection_t onnxruntime::TensorrtExecutionProvider::GetSupportedList(SubGraphCollection_t, int, int, const onnxruntime::GraphViewer&, bool*) const [ONNXRuntimeError] : 1 : FAIL : TensorRT input: /vq_model/enc_p/encoder_text/attn_layers.0/Pad_3_output_0 has no shape specified. Please run shape inference on the onnx model first. Details can be found in https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#shape-inference-for-tensorrt-subgraphs

CaoBaixin commented 7 months ago

hello，转onnx用的是onnx_export script吗，可以分享下torch和onnx版本吗？

stevin-dong commented 7 months ago

hello，转onnx用的是onnx_export script吗，可以分享下torch和onnx版本吗？

torch版本不影响吧，我的是2.1.2, onnx就默认安装的最新版本

wehos commented 7 months ago

目前版本onnx本身应该还要做很多优化吧。我自己写的onnx inference代码，在只测试vits部分的时候，发现当前repo里的export_onnx导出的模型默认是fp32精度下使用，用onnx runtime速度不如原生的pytorch（3.6s -> 4.2s)。改成fp16需要修改模型里的很多代码（因为模型里很多地方hardcode了fp32）。改完以后速度反而更慢了 (4.2s -> 5.48s)。后来调了一些runtime相关的设定才比原来的torch好一点点。

GPT部分改用onnx runtime以后会观察到显著的速度降低(3.5s -> 15s)，即使改用io binding也没有改善。你提到的tensorrt的error是因为这框架对transformer这种输入size可变的模型的支持不是很好，需要针对模型做专门的调整。目前总的来说onnx这一块还没有正式支持的感觉。

stevin-dong commented 7 months ago

目前版本onnx本身应该还要做很多优化吧。我自己写的onnx inference代码，在只测试vits部分的时候，发现当前repo里的export_onnx导出的模型默认是fp32精度下使用，用onnx runtime速度不如原生的pytorch（3.6s -> 4.2s)。改成fp16需要修改模型里的很多代码（因为模型里很多地方hardcode了fp32）。改完以后速度反而更慢了 (4.2s -> 5.48s)。后来调了一些runtime相关的设定才比原来的torch好一点点。

GPT部分改用onnx runtime以后会观察到显著的速度降低(3.5s -> 15s)，即使改用io binding也没有改善。你提到的tensorrt的error是因为这框架对transformer这种输入size可变的模型的支持不是很好，需要针对模型做专门的调整。目前总的来说onnx这一块还没有正式支持的感觉。

是的，只转onnx速度不会提升，如果能转tensorrt，速度应该能明显提升

Royalvice commented 7 months ago

目前版本onnx本身应该还要做很多优化吧。我自己写的onnx inference代码，在只测试vits部分的时候，发现当前repo里的export_onnx导出的模型默认是fp32精度下使用，用onnx runtime速度不如原生的pytorch（3.6s -> 4.2s)。改成fp16需要修改模型里的很多代码（因为模型里很多地方hardcode了fp32）。改完以后速度反而更慢了 (4.2s -> 5.48s)。后来调了一些runtime相关的设定才比原来的torch好一点点。

GPT部分改用onnx runtime以后会观察到显著的速度降低(3.5s -> 15s)，即使改用io binding也没有改善。你提到的tensorrt的error是因为这框架对transformer这种输入size可变的模型的支持不是很好，需要针对模型做专门的调整。目前总的来说onnx这一块还没有正式支持的感觉。

大佬能贴一个onnx inference的代码吗？

xwan07017 commented 7 months ago

目前版本onnx本身应该还要做很多优化吧。我自己写的onnx inference代码，在只测试vits部分的时候，发现当前repo里的export_onnx导出的模型默认是fp32精度下使用，用onnx runtime速度不如原生的pytorch（3.6s -> 4.2s)。改成fp16需要修改模型里的很多代码（因为模型里很多地方hardcode了fp32）。改完以后速度反而更慢了 (4.2s -> 5.48s)。后来调了一些runtime相关的设定才比原来的torch好一点点。

GPT部分改用onnx runtime以后会观察到显著的速度降低(3.5s -> 15s)，即使改用io binding也没有改善。你提到的tensorrt的error是因为这框架对transformer这种输入size可变的模型的支持不是很好，需要针对模型做专门的调整。目前总的来说onnx这一块还没有正式支持的感觉。

so, do u exe sucessfully with onnx model? could u show us onnx export code? i try to export onnx model with origin code in "onnx_export.py" . but when i got a onnx model and then i exe it, i got faild error message.

wehos commented 7 months ago

目前版本onnx本身应该还要做很多优化吧。我自己写的onnx inference代码，在只测试vits部分的时候，发现当前repo里的export_onnx导出的模型默认是fp32精度下使用，用onnx runtime速度不如原生的pytorch（3.6s -> 4.2s)。改成fp16需要修改模型里的很多代码（因为模型里很多地方hardcode了fp32）。改完以后速度反而更慢了 (4.2s -> 5.48s)。后来调了一些runtime相关的设定才比原来的torch好一点点。 GPT部分改用onnx runtime以后会观察到显著的速度降低(3.5s -> 15s)，即使改用io binding也没有改善。你提到的tensorrt的error是因为这框架对transformer这种输入size可变的模型的支持不是很好，需要针对模型做专门的调整。目前总的来说onnx这一块还没有正式支持的感觉。

so, do u exe sucessfully with onnx model? could u show us onnx export code? i try to export onnx model with origin code in "onnx_export.py" . but when i got a onnx model and then i exe it, i got faild error message.

I successfully ran it. If I remember correctly, the model exported by the original onnx_export.py can run in fp32 with SoVITS (that's what I tested at the very beginning).

I made a lot of modifications to the export part to simplify it, accommodate fp16 inference, and allow easier io binding. However, the performance is not satisfying. VITS is slightly faster than native pytorch, but GPT is much slower. Therefore the codes are still far from optimized.

The reason why I would not release the onnx codes for now, is because there is a great update in this fork. We have successfully wrapped the whole GPT-SoViTS model into torchscrips, and I'm pretty sure this one is faster than the original model on onnx runtime. Meanwhile, the Torchscript implementation may also be beneficial for onnx export. We need sometime to reimplement the onnx_export based on this new version.(Based on the torchscript, I've tried to directly export it to Tensorrt but still failed for the GPT part due to dynamic shape).

xwan07017 commented 6 months ago

目前版本onnx本身应该还要做很多优化吧。我自己写的onnx inference代码，在只测试vits部分的时候，发现当前repo里的export_onnx导出的模型默认是fp32精度下使用，用onnx runtime速度不如原生的pytorch（3.6s -> 4.2s)。改成fp16需要修改模型里的很多代码（因为模型里很多地方hardcode了fp32）。改完以后速度反而更慢了 (4.2s -> 5.48s)。后来调了一些runtime相关的设定才比原来的torch好一点点。 GPT部分改用onnx runtime以后会观察到显著的速度降低(3.5s -> 15s)，即使改用io binding也没有改善。你提到的tensorrt的error是因为这框架对transformer这种输入size可变的模型的支持不是很好，需要针对模型做专门的调整。目前总的来说onnx这一块还没有正式支持的感觉。

so, do u exe sucessfully with onnx model? could u show us onnx export code? i try to export onnx model with origin code in "onnx_export.py" . but when i got a onnx model and then i exe it, i got faild error message.

I successfully ran it. If I remember correctly, the model exported by the original onnx_export.py can run in fp32 with SoVITS (that's what I tested at the very beginning).

I made a lot of modifications to the export part to simplify it, accommodate fp16 inference, and allow easier io binding. However, the performance is not satisfying. VITS is slightly faster than native pytorch, but GPT is much slower. Therefore the codes are still far from optimized.

The reason why I would not release the onnx codes for now, is because there is a great update in this fork. We have successfully wrapped the whole GPT-SoViTS model into torchscrips, and I'm pretty sure this one is faster than the original model on onnx runtime. Meanwhile, the Torchscript implementation may also be beneficial for onnx export. We need sometime to reimplement the onnx_export based on this new version.(Based on the torchscript, I've tried to directly export it to Tensorrt but still failed for the GPT part due to dynamic shape).

Oh! This is really exciting news! Looking forward to your success!

Polaris231 commented 4 months ago

大佬，最后转tensorrt成功了嘛

wavetao2010 commented 4 months ago

我这里也遇到了一些问题，将模型导出成onnx模型会出现4个onnx的模型文件我成功将t2s_encoder的模型转化为TensorRT的模型，也成功推理了 Loading engine from /media/star/8T/PycharmProjects/github/gpt-sovits/onnx.wukong_t2s_encoder.fp32.trt Allocated buffers input for ref_seq: shape=torch.Size([1, 13]), dtype=<class 'numpy.int64'>, size=13 Allocated buffers input for text_seq: shape=torch.Size([1, 24]), dtype=<class 'numpy.int64'>, size=24 Allocated buffers input for ref_bert: shape=torch.Size([13, 1024]), dtype=<class 'numpy.float32'>, size=13312 Allocated buffers input for text_bert: shape=torch.Size([24, 1024]), dtype=<class 'numpy.float32'>, size=24576 Allocated buffers input for ssl_content: shape=torch.Size([1, 768, 249]), dtype=<class 'numpy.float32'>, size=191232 Allocated buffers output for x: shape=(1, 37, 512), dtype=<class 'numpy.float32'>, size=18944 Allocated buffers output for prompts: shape=(1, 124), dtype=<class 'numpy.int64'>, size=124 Output 0: tensor([[[ 2.0710, -1.3951, 2.2285, ..., 0.3731, 0.6351, -0.8600], [ 2.8519, -1.7754, 0.4135, ..., -5.9493, 0.3659, 2.7390], [-0.0368, -3.0811, -1.6646, ..., 5.0228, -1.9635, -3.3754], ..., [ 1.6228, -0.1707, 1.9572, ..., 8.0598, -1.8916, -3.7409], [-0.8561, -1.7103, -0.8074, ..., -1.8058, 0.3315, -1.7563], [-1.8883, 3.7421, -2.1380, ..., -6.0703, -0.7107, 7.7784]]]) torch.Size([1, 37, 512]) Output 1: tensor([[752, 184, 247, 243, 243, 243, 243, 916, 247, 240, 127, 243, 916, 247, 243, 237, 127, 916, 247, 247, 127, 59, 240, 237, 247, 127, 916, 247, 243, 916, 995, 247, 243, 916, 247, 127, 916, 247, 127, 243, 916, 127, 127, 243, 243, 240, 916, 731, 59, 247, 247, 247, 237, 59, 247, 995, 127, 243, 731, 127, 916, 127, 247, 243, 247, 237, 240, 240, 916, 243, 944, 247, 22, 247, 243, 237, 127, 127, 916, 247, 278, 243, 127, 247, 247, 247, 243, 127, 127, 127, 247, 247, 127, 995, 243, 916, 127, 247, 127, 995, 127, 127, 247, 247, 247, 184, 243, 184, 127, 127, 995, 995, 127, 995, 247, 127, 247, 247, 240, 731, 127, 243, 127, 184]]) torch.Size([1, 124]) Duration0.2941598892211914 格式符合x,prompts的两个输出shape，但是在我转换第二个t2s_fsdec时却出错了 [05/30/2024-20:54:10] [TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 693, GPU 16287 (MiB) [05/30/2024-20:54:22] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1939, GPU +348, now: CPU 2768, GPU 16635 (MiB) [05/30/2024-20:54:22] [TRT] [I] Parsing TensorRT model [05/30/2024-20:54:23] [TRT] [W] ModelImporter.cpp:420: Make sure input prompts has Int64 binding. TensorRT ONNX parser error: Assertion failed: !isDynamic(shape): Cannot infer squeeze dimensions from a dynamic shape! Please re-export your model with the Squeeze axes input set. [05/30/2024-20:54:23] [TRT] [W] Building engine. Depending on model size this may take a while [05/30/2024-20:54:23] [TRT] [E] 4: [network.cpp::validate::3257] Error Code 4: Internal Error (Network must have at least one output) [05/30/2024-20:54:23] [TRT] [W] Building engine took 0.0 seconds 这里有两个问题，

Squeeze 操作的动态形状问题：错误信息 Assertion failed: !isDynamic(shape): Cannot infer squeeze dimensions from a dynamic shape! Please re-export your model with the Squeeze axes input set. 表示在模型中存在一个 Squeeze 操作，它无法从动态形状中推断出需要挤压的维度。必须在导出 ONNX 模型时，确保 Squeeze 操作的轴是明确设置的。
网络必须至少有一个输出：错误信息 Network must have at least one output 表示构建的网络没有任何输出。这可能是因为在 ONNX 模型解析时出错，导致输出节点没有正确添加到网络中。这两个问题还在解决中

ajiansoft commented 3 months ago

各位大佬，有跑成功了的吗？

ctrlcplusv commented 1 month ago

@wavetao2010 哈喽，我转t2s_encoder.trt失败了，你是用什么方式转的啊

RVC-Boss / GPT-SoVITS

模型转tensorrt加速，有大佬成功吗？ #618