auspicious3000 / autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
https://arxiv.org/abs/1905.05879
MIT License
976 stars 207 forks source link

differences in mel-spectogram #97

Open amiteliav opened 2 years ago

amiteliav commented 2 years ago

Hi

i working with your git - it is really good! thanks

im trying to generate my own mel-spectrogram with your code in "make_spect.py" here is the Demo mel-spectrogram: p225_003 Demo

here is my mel-spectogram : (p225_003) my

the sizes are not the same: Demo: (376, 80) My: (475, 80)

and you can see the spectrogram don't look the same, the demo is all over the range of the spectrogram whereas my isn't. mine looks the same but more compressed.

when using the demo spectrogram - the conversion works. when using my spectrogram - it doesn't

any idea why the spectrograms are different? and how the correct it?

thanks Amit

auspicious3000 commented 2 years ago

Your frequency axis and time axis are swapped.

amiteliav commented 2 years ago

thanks, you are right, the axes were swapped, these are the new plots:

Demo: Demo

My with make_spect: my

but still, there are some differences. the size of the spectrograms is not the same. the demo: (80, 376) My with make_spect: (80, 475)

I used the code make_spect.py so i thought i should get the same results as the demo. when i use these spectrograms, the results of the conversion are very different. using the demo, I get a nice good conversion, but using the spectrogram I created with the make_spect i get a very unclear result. hope you could help me understand why, because i cant get the model to convert new files, not from the demo dataset :/

thanks

auspicious3000 commented 2 years ago

They should only differ by the amount of silence before and after. Please confirm if this is true.

MHVali commented 1 year ago

@amiteliav @auspicious3000 Hi, I am trying to get a good conversion quality using this repo, but I cannot. Could you please let me know what hyper-parameters you use for "dim_neck", "freq", "batch_size", and "num_itrs"? I am using the small data which is prepared in this repo. Could you pleas let me know if you use any other dataset that gives you a good conversion? Thanks in advance!