Open stevin-dong opened 7 months ago
hello, 转onnx用的是onnx_export script吗,可以分享下torch和onnx版本吗?
hello, 转onnx用的是onnx_export script吗,可以分享下torch和onnx版本吗?
torch版本不影响吧,我的是2.1.2, onnx就默认安装的最新版本
目前版本onnx本身应该还要做很多优化吧。我自己写的onnx inference代码,在只测试vits部分的时候,发现当前repo里的export_onnx导出的模型默认是fp32精度下使用,用onnx runtime速度不如原生的pytorch(3.6s -> 4.2s)。改成fp16需要修改模型里的很多代码(因为模型里很多地方hardcode了fp32)。改完以后速度反而更慢了 (4.2s -> 5.48s)。后来调了一些runtime相关的设定才比原来的torch好一点点。
GPT部分改用onnx runtime以后会观察到显著的速度降低(3.5s -> 15s),即使改用io binding也没有改善。你提到的tensorrt的error是因为这框架对transformer这种输入size可变的模型的支持不是很好,需要针对模型做专门的调整。目前总的来说onnx这一块还没有正式支持的感觉。
目前版本onnx本身应该还要做很多优化吧。我自己写的onnx inference代码,在只测试vits部分的时候,发现当前repo里的export_onnx导出的模型默认是fp32精度下使用,用onnx runtime速度不如原生的pytorch(3.6s -> 4.2s)。改成fp16需要修改模型里的很多代码(因为模型里很多地方hardcode了fp32)。改完以后速度反而更慢了 (4.2s -> 5.48s)。后来调了一些runtime相关的设定才比原来的torch好一点点。
GPT部分改用onnx runtime以后会观察到显著的速度降低(3.5s -> 15s),即使改用io binding也没有改善。你提到的tensorrt的error是因为这框架对transformer这种输入size可变的模型的支持不是很好,需要针对模型做专门的调整。目前总的来说onnx这一块还没有正式支持的感觉。
是的,只转onnx速度不会提升,如果能转tensorrt,速度应该能明显提升
目前版本onnx本身应该还要做很多优化吧。我自己写的onnx inference代码,在只测试vits部分的时候,发现当前repo里的export_onnx导出的模型默认是fp32精度下使用,用onnx runtime速度不如原生的pytorch(3.6s -> 4.2s)。改成fp16需要修改模型里的很多代码(因为模型里很多地方hardcode了fp32)。改完以后速度反而更慢了 (4.2s -> 5.48s)。后来调了一些runtime相关的设定才比原来的torch好一点点。
GPT部分改用onnx runtime以后会观察到显著的速度降低(3.5s -> 15s),即使改用io binding也没有改善。你提到的tensorrt的error是因为这框架对transformer这种输入size可变的模型的支持不是很好,需要针对模型做专门的调整。目前总的来说onnx这一块还没有正式支持的感觉。
大佬能贴一个onnx inference的代码吗?
目前版本onnx本身应该还要做很多优化吧。我自己写的onnx inference代码,在只测试vits部分的时候,发现当前repo里的export_onnx导出的模型默认是fp32精度下使用,用onnx runtime速度不如原生的pytorch(3.6s -> 4.2s)。改成fp16需要修改模型里的很多代码(因为模型里很多地方hardcode了fp32)。改完以后速度反而更慢了 (4.2s -> 5.48s)。后来调了一些runtime相关的设定才比原来的torch好一点点。
GPT部分改用onnx runtime以后会观察到显著的速度降低(3.5s -> 15s),即使改用io binding也没有改善。你提到的tensorrt的error是因为这框架对transformer这种输入size可变的模型的支持不是很好,需要针对模型做专门的调整。目前总的来说onnx这一块还没有正式支持的感觉。
so, do u exe sucessfully with onnx model? could u show us onnx export code? i try to export onnx model with origin code in "onnx_export.py" . but when i got a onnx model and then i exe it, i got faild error message.
目前版本onnx本身应该还要做很多优化吧。我自己写的onnx inference代码,在只测试vits部分的时候,发现当前repo里的export_onnx导出的模型默认是fp32精度下使用,用onnx runtime速度不如原生的pytorch(3.6s -> 4.2s)。改成fp16需要修改模型里的很多代码(因为模型里很多地方hardcode了fp32)。改完以后速度反而更慢了 (4.2s -> 5.48s)。后来调了一些runtime相关的设定才比原来的torch好一点点。 GPT部分改用onnx runtime以后会观察到显著的速度降低(3.5s -> 15s),即使改用io binding也没有改善。你提到的tensorrt的error是因为这框架对transformer这种输入size可变的模型的支持不是很好,需要针对模型做专门的调整。目前总的来说onnx这一块还没有正式支持的感觉。
so, do u exe sucessfully with onnx model? could u show us onnx export code? i try to export onnx model with origin code in "onnx_export.py" . but when i got a onnx model and then i exe it, i got faild error message.
I successfully ran it. If I remember correctly, the model exported by the original onnx_export.py
can run in fp32 with SoVITS (that's what I tested at the very beginning).
I made a lot of modifications to the export part to simplify it, accommodate fp16 inference, and allow easier io binding. However, the performance is not satisfying. VITS is slightly faster than native pytorch, but GPT is much slower. Therefore the codes are still far from optimized.
The reason why I would not release the onnx codes for now, is because there is a great update in this fork. We have successfully wrapped the whole GPT-SoViTS model into torchscrips, and I'm pretty sure this one is faster than the original model on onnx runtime. Meanwhile, the Torchscript implementation may also be beneficial for onnx export. We need sometime to reimplement the onnx_export
based on this new version.(Based on the torchscript, I've tried to directly export it to Tensorrt but still failed for the GPT part due to dynamic shape).
目前版本onnx本身应该还要做很多优化吧。我自己写的onnx inference代码,在只测试vits部分的时候,发现当前repo里的export_onnx导出的模型默认是fp32精度下使用,用onnx runtime速度不如原生的pytorch(3.6s -> 4.2s)。改成fp16需要修改模型里的很多代码(因为模型里很多地方hardcode了fp32)。改完以后速度反而更慢了 (4.2s -> 5.48s)。后来调了一些runtime相关的设定才比原来的torch好一点点。 GPT部分改用onnx runtime以后会观察到显著的速度降低(3.5s -> 15s),即使改用io binding也没有改善。你提到的tensorrt的error是因为这框架对transformer这种输入size可变的模型的支持不是很好,需要针对模型做专门的调整。目前总的来说onnx这一块还没有正式支持的感觉。
so, do u exe sucessfully with onnx model? could u show us onnx export code? i try to export onnx model with origin code in "onnx_export.py" . but when i got a onnx model and then i exe it, i got faild error message.
I successfully ran it. If I remember correctly, the model exported by the original
onnx_export.py
can run in fp32 with SoVITS (that's what I tested at the very beginning).I made a lot of modifications to the export part to simplify it, accommodate fp16 inference, and allow easier io binding. However, the performance is not satisfying. VITS is slightly faster than native pytorch, but GPT is much slower. Therefore the codes are still far from optimized.
The reason why I would not release the onnx codes for now, is because there is a great update in this fork. We have successfully wrapped the whole GPT-SoViTS model into torchscrips, and I'm pretty sure this one is faster than the original model on onnx runtime. Meanwhile, the Torchscript implementation may also be beneficial for onnx export. We need sometime to reimplement the
onnx_export
based on this new version.(Based on the torchscript, I've tried to directly export it to Tensorrt but still failed for the GPT part due to dynamic shape).
Oh! This is really exciting news! Looking forward to your success!
大佬,最后转tensorrt成功了嘛
我这里也遇到了一些问题,将模型导出成onnx模型会出现4个onnx的模型文件 我成功将t2s_encoder的模型转化为TensorRT的模型,也成功推理了 Loading engine from /media/star/8T/PycharmProjects/github/gpt-sovits/onnx.wukong_t2s_encoder.fp32.trt Allocated buffers input for ref_seq: shape=torch.Size([1, 13]), dtype=<class 'numpy.int64'>, size=13 Allocated buffers input for text_seq: shape=torch.Size([1, 24]), dtype=<class 'numpy.int64'>, size=24 Allocated buffers input for ref_bert: shape=torch.Size([13, 1024]), dtype=<class 'numpy.float32'>, size=13312 Allocated buffers input for text_bert: shape=torch.Size([24, 1024]), dtype=<class 'numpy.float32'>, size=24576 Allocated buffers input for ssl_content: shape=torch.Size([1, 768, 249]), dtype=<class 'numpy.float32'>, size=191232 Allocated buffers output for x: shape=(1, 37, 512), dtype=<class 'numpy.float32'>, size=18944 Allocated buffers output for prompts: shape=(1, 124), dtype=<class 'numpy.int64'>, size=124 Output 0: tensor([[[ 2.0710, -1.3951, 2.2285, ..., 0.3731, 0.6351, -0.8600], [ 2.8519, -1.7754, 0.4135, ..., -5.9493, 0.3659, 2.7390], [-0.0368, -3.0811, -1.6646, ..., 5.0228, -1.9635, -3.3754], ..., [ 1.6228, -0.1707, 1.9572, ..., 8.0598, -1.8916, -3.7409], [-0.8561, -1.7103, -0.8074, ..., -1.8058, 0.3315, -1.7563], [-1.8883, 3.7421, -2.1380, ..., -6.0703, -0.7107, 7.7784]]]) torch.Size([1, 37, 512]) Output 1: tensor([[752, 184, 247, 243, 243, 243, 243, 916, 247, 240, 127, 243, 916, 247, 243, 237, 127, 916, 247, 247, 127, 59, 240, 237, 247, 127, 916, 247, 243, 916, 995, 247, 243, 916, 247, 127, 916, 247, 127, 243, 916, 127, 127, 243, 243, 240, 916, 731, 59, 247, 247, 247, 237, 59, 247, 995, 127, 243, 731, 127, 916, 127, 247, 243, 247, 237, 240, 240, 916, 243, 944, 247, 22, 247, 243, 237, 127, 127, 916, 247, 278, 243, 127, 247, 247, 247, 243, 127, 127, 127, 247, 247, 127, 995, 243, 916, 127, 247, 127, 995, 127, 127, 247, 247, 247, 184, 243, 184, 127, 127, 995, 995, 127, 995, 247, 127, 247, 247, 240, 731, 127, 243, 127, 184]]) torch.Size([1, 124]) Duration0.2941598892211914 格式符合x,prompts的两个输出shape,但是在我转换第二个t2s_fsdec时却出错了 [05/30/2024-20:54:10] [TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 693, GPU 16287 (MiB) [05/30/2024-20:54:22] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1939, GPU +348, now: CPU 2768, GPU 16635 (MiB) [05/30/2024-20:54:22] [TRT] [I] Parsing TensorRT model [05/30/2024-20:54:23] [TRT] [W] ModelImporter.cpp:420: Make sure input prompts has Int64 binding. TensorRT ONNX parser error: Assertion failed: !isDynamic(shape): Cannot infer squeeze dimensions from a dynamic shape! Please re-export your model with the Squeeze axes input set. [05/30/2024-20:54:23] [TRT] [W] Building engine. Depending on model size this may take a while [05/30/2024-20:54:23] [TRT] [E] 4: [network.cpp::validate::3257] Error Code 4: Internal Error (Network must have at least one output) [05/30/2024-20:54:23] [TRT] [W] Building engine took 0.0 seconds 这里有两个问题,
各位大佬,有跑成功了的吗?
@wavetao2010 哈喽,我转t2s_encoder.trt失败了,你是用什么方式转的啊
我转Onnx后,再转tensorrt,出错: SubGraphCollection_t onnxruntime::TensorrtExecutionProvider::GetSupportedList(SubGraphCollection_t, int, int, const onnxruntime::GraphViewer&, bool*) const [ONNXRuntimeError] : 1 : FAIL : TensorRT input: /vq_model/enc_p/encoder_text/attn_layers.0/Pad_3_output_0 has no shape specified. Please run shape inference on the onnx model first. Details can be found in https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#shape-inference-for-tensorrt-subgraphs