OlaWod / FreeVC

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
MIT License
597 stars 109 forks source link

Data_utils.py changes #35

Closed steven850 closed 1 year ago

steven850 commented 1 year ago

Since you made the changes to the data_utils the other day I am noticing that it seems to be training on much shorter segments.

the eval audios are also all below 1 second long, most of them just 40 frames. also noticing some obvious gaps/blurs in the generated/GT mels. Can see the "Blurry spots" in these generated mels. where the formants just disappear.

genvGT david genvGT david 2

OlaWod commented 1 year ago

As training goes by the synthesis ability of the model will improve and the synthesized mels will have less over-smoothness problem. 100k steps (batchsize 64) is still in an early training stage. And, yes this data_utils.py (which follows the logic of my uncleaned code) uses much shorter segments, that's why I wrote the old problematic data_utils.py (which concatenates short segments so that a batch contains more speech data). But you know several days ago I found that this concatenation is harmful for speech quality, although it converges faster (reduces over-smoothness problem faster) due to more data in a batch.