First of all, great start for a beginning in VC! A very promising approach, I hope you will reiterate on. Inference already works pretty good
besides the slightly worse quality than rvc v2. Streaming also works good but there are a lot of artifacts when you do not talk. So it can not process silence very good, as soon as you stop talking you hear chinese/english voice parts. I think in RVC they have a "silent" model called mute for this in the model directory, maybe this approach could help improve on the streaming capabilities.
First of all, great start for a beginning in VC! A very promising approach, I hope you will reiterate on. Inference already works pretty good besides the slightly worse quality than rvc v2. Streaming also works good but there are a lot of artifacts when you do not talk. So it can not process silence very good, as soon as you stop talking you hear chinese/english voice parts. I think in RVC they have a "silent" model called mute for this in the model directory, maybe this approach could help improve on the streaming capabilities.