auspicious3000 / autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
https://arxiv.org/abs/1905.05879
MIT License
976 stars 207 forks source link

How to use this for repo for just testing? #67

Open sandeshnaroju opened 3 years ago

sandeshnaroju commented 3 years ago

I just want to play with this repo, I don't want to train/build anything. Just use it for few times. Any instructions how to do it?

ruclion commented 3 years ago

Just do this like readme said~

0.Convert Mel-Spectrograms Download pre-trained AUTOVC model, and run the conversion.ipynb in the same directory.

1.Mel-Spectrograms to waveform Download pre-trained WaveNet Vocoder model, and run the vocoder.ipynb in the same the directory.

Please note the training metadata and testing metadata have different formats.

ghost commented 3 years ago

And how make inference after that?

ruclion commented 3 years ago

The import thing is to get "metadata.pkl" it can be get by run make_spect.py -> python make_metadata.py if you directly run them, then use author's wavs if change wavs to ourselves, metadata.pkl is our's wavs, and then read code conversion.ipynb and run it~

ghost commented 3 years ago

python make_metadata.py does NOT generate "metadata.pkl". You can check the code.

aneybaby727 commented 3 years ago

@ruclion I have the same problem make_metadata.py does NOT generate "metadata.pkl".

hongchengzhu commented 3 years ago

@ruclion I have the same problem make_metadata.py does NOT generate "metadata.pkl".

python make_metadata.py does NOT generate "metadata.pkl". You can check the code.

Hello, I met the same question as you. So could you please share how you solve the question? Thank you in advance!

hongchengzhu commented 3 years ago

I just want to play with this repo, I don't want to train/build anything. Just use it for few times. Any instructions how to do it?

Have you solved the question? Could you share the solution, please? Thank you.

atravler commented 3 years ago

I just want to play with this repo, I don't want to train/build anything. Just use it for few times. Any instructions how to do it?

Have you solved the question? Could you share the solution, please? Thank you.

so do i,do you have any solution?

jlian2 commented 2 years ago

If you put only one wav file into each speaker directory, this modified make_metadata.py should work:

import pickle
from model_bl import D_VECTOR
from collections import OrderedDict
import numpy as np
import torch

C = D_VECTOR(dim_input=80, dim_cell=768, dim_emb=256).eval().cuda()
c_checkpoint = torch.load('3000000-BL.ckpt')
new_state_dict = OrderedDict()
for key, val in c_checkpoint['model_b'].items():
    new_key = key[7:]
    new_state_dict[new_key] = val
C.load_state_dict(new_state_dict)
num_uttrs = 1
len_crop = 128

# Directory containing mel-spectrograms
rootDir = './spmel'
dirName, subdirList, _ = next(os.walk(rootDir))
print('Found directory: %s' % dirName)

speakers = []

for speaker in sorted(subdirList):
    if len(speaker) != 4:
        continue
    print('Processing speaker: %s' % speaker)
    utterances = []
    utterances.append(speaker)
    _, _, fileList = next(os.walk(os.path.join(dirName,speaker)))

    idx_uttrs = np.random.choice(len(fileList), size=num_uttrs, replace=False)

    embs = []
    mel_specs = []
    for i in range(num_uttrs):

        tmp = np.load(os.path.join(dirName, speaker, fileList[idx_uttrs[i]]))
        candidates = np.delete(np.arange(len(fileList)), idx_uttrs)
        melsp = torch.from_numpy(tmp).cuda().unsqueeze(0)
        emb = C(melsp)
        embs.append(emb.detach().squeeze().cpu().numpy())   
        mel_specs.append(melsp.squeeze(0))

    utterances.append(np.mean(embs, axis=0)) #this is spker embedding

    for mel_spec in mel_specs:
        utterances.append(mel_spec.cpu().numpy())
    speakers.append(utterances)

print("len of speaker", len(speakers))

with open(os.path.join('metadata_own.pkl'), 'wb') as handle:
    pickle.dump(speakers, handle)
dragen1860 commented 2 years ago

how to use my own source content wav and target style wav ? thank you.

Ha0Tang commented 2 years ago

@dragen1860 have you fixed the issue?