Open zhusy09 opened 4 months ago
well you can try fp16 inference, we may try some inference optimization method later, but now we focus on fixing some bugs and make this repo easier to use.
I think you can split text into small pieces.
I think you can split text into small pieces.
Text segmentation inference only speeds up the stream response, but does not improve the overall inference efficiency
I think you can split text into small pieces.
Text segmentation inference only speeds up the stream response, but does not improve the overall inference efficiency
Transformer time complexity is O(N^2), so it can improve by splitting sentences.
I think you can split text into small pieces.
Text segmentation inference only speeds up the stream response, but does not improve the overall inference efficiency
Transformer time complexity is O(N^2), so it can improve by splitting sentences.
Sorry, I was wrong. It can't improve. On single RTX3090: Generating '你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?' for 10 times takes 36 seconds. Generating '你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?' * 10 takes 30 seconds.
I think you can split text into small pieces.
Text segmentation inference only speeds up the stream response, but does not improve the overall inference efficiency
Transformer time complexity is O(N^2), so it can improve by splitting sentences.
Sorry, I was wrong. It can't improve. On single RTX3090: Generating '你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?' for 10 times takes 36 seconds. Generating '你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?' * 10 takes 30 seconds.
The reason is that the author has used text splitter in self.frontend.text_normalize.
这个可以部署在kaggle上吗,用T4显卡跑,不知道推理速度怎么样
well you can try fp16 inference, we may try some inference optimization method later, but now we focus on fixing some bugs and make this repo easier to use.
试了,但没啥反应,也用了torch.compile,但也没啥改善。
这个可以部署在kaggle上吗,用T4显卡跑,不知道推理速度怎么样
you can try it, it should be ok
well you can try fp16 inference, we may try some inference optimization method later, but now we focus on fixing some bugs and make this repo easier to use.
试了,但没啥反应,也用了torch.compile,但也没啥改善。
怎么改成fp16进行推理啊
同问
同问
同问
同问
Flash attention 1/2、Paged attention and model quantization may be useful to speed the inference greatly.
well you can try fp16 inference, we may try some inference optimization method later, but now we focus on fixing some bugs and make this repo easier to use.
试了,但没啥反应,也用了torch.compile,但也没啥改善。
我也改了fp16,显存少了一半,速度没变。。。
怎么修改啊,我修改后,推理出来的音频没声音了
---原始邮件--- 发件人: "Frank @.> 发送时间: 2024年7月22日(周一) 下午4:34 收件人: @.>; 抄送: @.**@.>; 主题: Re: [FunAudioLLM/CosyVoice] 推理速度很慢 (Issue #75)
well you can try fp16 inference, we may try some inference optimization method later, but now we focus on fixing some bugs and make this repo easier to use.
试了,但没啥反应,也用了torch.compile,但也没啥改善。
我也改了fp16,显存少了一半,速度没变。。。
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
怎么修改啊,我修改后,推理出来的音频没声音了 … ---原始邮件--- 发件人: "Frank @.> 发送时间: 2024年7月22日(周一) 下午4:34 收件人: @.>; 抄送: @.**@.>; 主题: Re: [FunAudioLLM/CosyVoice] 推理速度很慢 (Issue #75) well you can try fp16 inference, we may try some inference optimization method later, but now we focus on fixing some bugs and make this repo easier to use. 试了,但没啥反应,也用了torch.compile,但也没啥改善。 我也改了fp16,显存少了一半,速度没变。。。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
直接用的autocast,文本长了之后我也没声音了,短文本可以
well you can try fp16 inference, we may try some inference optimization method later, but now we focus on fixing some bugs and make this repo easier to use.
试了,但没啥反应,也用了torch.compile,但也没啥改善。
我也改了fp16,显存少了一半,速度没变。。。
我也model.half了,只有flow部分可以加速,LLM部分速度没变,请问您解决这个问题了吗
well you can try fp16 inference, we may try some inference optimization method later, but now we focus on fixing some bugs and make this repo easier to use.
试了,但没啥反应,也用了torch.compile,但也没啥改善。
我也改了fp16,显存少了一半,速度没变。。。
我也model.half了,只有flow部分可以加速,LLM部分速度没变,请问您解决这个问题了吗
请问下flow部分要怎么half呀? 我一直报错RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half
目前测试推理速度很慢,用A100的推理,推理生成1分钟的音频,有时推理时间能达到1分钟。请问有什么优化的方法么?
请问找到解决方法了吗?
目前测试推理速度很慢,用的 A100 推理,推理 生成 1 分钟的音频,有时推理时间能达到接近 1 分钟。请问有什么优化的方法麽?
碰到同样的问题,即使改用流式,首句37个字,也需要18秒后生成音频流。
把长文本分句,然后多线程并行推理,可以有效缩短时间。
@boomyao 意思是 infer("1") + infer("2") + infer("3") <= infer ("1"+"2"+"3") ? 速度快很多吗?我测算过T4上面GPU Util全部拉满了,只能跑1路
well you can try fp16 inference, we may try some inference optimization method later, but now we focus on fixing some bugs and make this repo easier to use.
试了,但没啥反应,也用了torch.compile,但也没啥改善。
我也改了fp16,显存少了一半,速度没变。。。
我也model.half了,只有flow部分可以加速,LLM部分速度没变,请问您解决这个问题了吗
请问下flow部分要怎么half呀? 我一直报错RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half
你好,这个问题解决了吗,我也遇到这个问题
目前测试推理速度很慢,用的 A100 推理,推理 生成 1 分钟的音频,有时推理时间能达到接近 1 分钟。请问有什么优化的方法麽?