Open junseokoh1 opened 2 years ago
Also there is about 400 utterance for each speaker
At make_metadata.py code, to use all data,
i replace
https://github.com/auspicious3000/autovc/blob/79dda70cff8e4e15e634f64dd7364c6a090b799b/make_metadata.py#L42-L47
to
melsp = torch.from_numpy(tmp[np.newaxis, :, :]).cuda()
You could just skip the shorter utterances.
Thanks for your nice project and let us to use.
I guess the pre-trained model that this project give is only trained for validation with only 4 data(p225, p226, p227, p228)
So to reproduce the result of paper, i download VCTK dataset and as paper say i pick 40 speaker (40 speakers for zero-shot train), which is p225 ~ p269 ( maybe 40 ~ 43 speaker, also i change sampling rate to 16k)
When making train.pkl with make_matadata.py there is error with len_crop.
ValueError: 'a' cannot be empty unless no samples are taken
The default value of len_crop is 128, but there is shorter data in VCTK.
I want to know whether you just remove the len_crop part or just use VCTK data which is longer then 128.
Thanks for reading issue.