SayaSS / vits-finetuning

Fine-Tuning your VITS model using a pre-trained model
MIT License
551 stars 86 forks source link

chore(readme): add datasets detail in readme #26

Closed alfonks closed 1 year ago

alfonks commented 1 year ago

For now, the scale isn't specified. I'm using audacity to export the audio file, by default the scale it uses is Mel and it turns out the tensor padding wont work. When I try to check the tensor shape, it became like this image

Meanwhile the provided data from repository shape's like this image

For this case, I use Audacity to export the .wav file, after I turn on the multi-view settings on the audio file and change the scale's spectrogram settings for that audio to Linear I am able to achieve similar shape like the provided audio file from repository. image

SayaSS commented 1 year ago

Thank you for your help! But the scale of spectrogram views in Audacity is just a different observation scale and does not change the file itself. You just need to convert the audio to mono. 20230405132418

alfonks commented 1 year ago

ok, noted i guess this isn't needed then. thank you!