Open leminhnguyen opened 4 months ago
@OlaWod Did you compare the results with other models like FreeVC, KNN-VC ?
src | vctk | libritts | esd(emotive, en) | esd(neutral, zh) |
---|---|---|---|---|
knnvc (5min matching set) | 82.50% | 86.53% | 82.95% | 82.95% |
freevc | 83.89% | 87.34% | 83.14% | 83.77% |
pitchvc | 84.96% | 87.96% | 83.14% | 83.29% |
(source wav sampled from datasets in the first line, 500 wavs each; converted to 12 seen speakers)
@OlaWod I see this model is better than KNN-VC as well as Phoneme Hallucinator about intelligibility (WER). That is amazing!!!
@OlaWod I see this model is better than KNN-VC as well as Phoneme Hallucinator about intelligibility (WER). That is amazing!!!
- How did you do that? From my understanding, you improved this model based on the FreeVC, can you provide the changes in detail in this model?
- What do you think about the results for cross-lingual?
Thank you, I'll try this model !!!
When train the model, I encountered this error:
Traceback (most recent call last):
File "/home/lmnguyen/PitchVC/train.py", line 325, in <module>
main()
File "/home/lmnguyen/PitchVC/train.py", line 321, in main
train(0, a, h)
File "/home/lmnguyen/PitchVC/train.py", line 154, in train
spec, phase = generator(x, mel, spk_emb, spk_id)
File "/home/lmnguyen/miniconda3/envs/voice-conversion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/lmnguyen/miniconda3/envs/voice-conversion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lmnguyen/PitchVC/models.py", line 484, in forward
g = self.embed_spk(spk_id).transpose(1, 2)
File "/home/lmnguyen/miniconda3/envs/voice-conversion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/lmnguyen/miniconda3/envs/voice-conversion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lmnguyen/miniconda3/envs/voice-conversion/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 163, in forward
return F.embedding(
File "/home/lmnguyen/miniconda3/envs/voice-conversion/lib/python3.9/site-packages/torch/nn/functional.py", line 2237, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Any suggesstion @OlaWod?
Solved the problem!!! @OlaWod Could you share the pretrained do_*
model to finetune ?
Solved the problem!!! @OlaWod Could you share the pretrained
do_*
model to finetune ?
i am outside these days, will do it when back home
Solved the problem!!! @OlaWod Could you share the pretrained
do_*
model to finetune ?
the corresponding do_0070000 in exp/default
dir is deleted. i put another checkpoint in exp/test
dir. they are not much different, but trained on different machines.
When train the model, I encountered this error:
Traceback (most recent call last): File "/home/lmnguyen/PitchVC/train.py", line 325, in <module> main() File "/home/lmnguyen/PitchVC/train.py", line 321, in main train(0, a, h) File "/home/lmnguyen/PitchVC/train.py", line 154, in train spec, phase = generator(x, mel, spk_emb, spk_id) File "/home/lmnguyen/miniconda3/envs/voice-conversion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/lmnguyen/miniconda3/envs/voice-conversion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/lmnguyen/PitchVC/models.py", line 484, in forward g = self.embed_spk(spk_id).transpose(1, 2) File "/home/lmnguyen/miniconda3/envs/voice-conversion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/lmnguyen/miniconda3/envs/voice-conversion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/lmnguyen/miniconda3/envs/voice-conversion/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 163, in forward return F.embedding( File "/home/lmnguyen/miniconda3/envs/voice-conversion/lib/python3.9/site-packages/torch/nn/functional.py", line 2237, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: CUDA error: device-side assert triggered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Any suggesstion @OlaWod?
i suppose it because i hardcoded the speaker number in nn.Embedding
to 108 here, but you have more than 108 speakers in your data?
@OlaWod You're correct, I have more speakers so I change the 108
to my number of speakers.
no.