auspicious3000 / autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
https://arxiv.org/abs/1905.05879
MIT License
990 stars 205 forks source link

How to test on my own data? #108

Open Ha0Tang opened 2 years ago

Ha0Tang commented 2 years ago

How to test on my own data? I have a "Source Speaker / Speech" and a "Target Speaker / Speech", I want to generate the "Conversion", as shown on the demo page https://auspicious3000.github.io/autovc-demo/. Can anyone provide some instructions?

auspicious3000 commented 2 years ago

You can follow the code in conversion.ipynb

ljc222 commented 2 years ago

how to generate the metadata.pkl file?

lisabecker commented 2 years ago

I'm also interested in generating a metadata.pkl from my own data for inference. @auspicious3000 do you happen to have the script that produces the metadata.pkl that's used for inference, not for training?

Simply adding the mel spectograms produced by make_spect.py and make_metadata.py to the lists of speakers in make_metadata.py instead of the paths does not produce sensible results with conversion.ipynb, just noise.

auspicious3000 commented 2 years ago

Each metadata is a list of [filename, speaker embedding, spectrogram]

lisabecker commented 2 years ago

@ljc222 @Ha0Tang if you go through past issues, you'll stumble across this repo/notebook which puzzles it all together to make it work end-to-end: https://github.com/KnurpsBram/AutoVC_WavenetVocoder_GriffinLim_experiments/blob/master/AutoVC_WavenetVocoder_GriffinLim_experiments_17jun2020.ipynb

Hope this helps!

arthurwolf commented 7 months ago

You can follow the code in conversion.ipynb

I looked at https://github.com/auspicious3000/autovc/blob/master/conversion.ipynb and I have zero idea how that helps with this.

I just need some way to, on the command line, provide the original voice as a wav/mp3, provide the file to change the voice of as wav/mp3, and get the output file with the voice changed written to disk.

How do I do that? How does anyone ever use this if something this basic isn't documented? Am I missing something obvious ?

Thanks a lot to anyone with any information.

jvel07 commented 5 months ago

@ljc222 @Ha0Tang if you go through past issues, you'll stumble across this repo/notebook which puzzles it all together to make it work end-to-end: https://github.com/KnurpsBram/AutoVC_WavenetVocoder_GriffinLim_experiments/blob/master/AutoVC_WavenetVocoder_GriffinLim_experiments_17jun2020.ipynb

@lisabecker hi! How do you generate the metadata for inference? make_metadata.py generates metadata for train, or in the end, this script is for generating metadata for both train and inference? (just with a different path for the mel-spec)