effusiveperiscope / so-vits-svc

so-vits-svc
MIT License
179 stars 71 forks source link

stable f0 estimation #2

Closed soliloqueen closed 1 year ago

soliloqueen commented 1 year ago

cc results in more stable f0 estimation than ac at these sorts of time steps. doing this can cause audible pitch windowing, which is mitigated by leaving the hubert f0 method the same. the net result is significantly improved inference quality with little downside

Bad case source recording: Stock: The BabelFish Tuned: The BabelFish

soliloqueen commented 1 year ago

i can't open this as an issue or message you, but i described the problem that i thought sovits was having that was causing the leaking based on my observations as someone who works in voice, and i proposed the solution of reducing the amount of knowledge sovits has about the input audio to cause it to need to fill in the gaps with speaker-correct information, and someone on /ppp/ said that the correct way to phrase my idea was "culling the phoneme posteriogram to the top 2 phonemes" which after googling it seems like a correct description of the fix that i understand in my head but am struggling to explain with my lack of language. is trying something like this within your power?

soliloqueen commented 1 year ago

@effusiveperiscope

effusiveperiscope commented 1 year ago

Noted; issues have been opened, thanks. I'm not an ML researcher or an expert in SVS--I'm really just an end user. I'm not sure what a "phoneme posteriorgram" is, and it's not immediately obvious to me where it occurs in the code. If no one else is willing to do it, I could give it a shot, but it would probably be more efficient to ask someone who actually understands this stuff.