Open vitreo12 opened 2 months ago
Thanks for your positive comments. Regarding to your questions, we already have a solution for streaming inference, but we are still on the way of increasing stability and inference speed before releasing it
Nice! Looking forward to the implementation then :)
Streaming inference GUI has been released
Hello!
First off, amazing project! I cloned it and got it up and running quite easily.
I have a question about the streaming inference mentioned in the readme. Is this to allow to run inference on real-time audio, instead of one shot conversion? I ran some benchmarks and it seems to me that the bottleneck is with the cosyvoice speech token extraction. I wonder how could this work with real-time audio? The target voice speech tokens can be extracted before inference time, but I wonder how would you approach extracting the one for the source voice. Do you plan on a different architecture to make it work for real-time audio streams?
Thank you and have a great day!